In article <[EMAIL PROTECTED]>,
John Nagle <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] wrote:
>> Thanks to all who replied. It's very appreciated.
>>
>> Yes, I had to double check line counts and the number of lines is ~16
>> million (instead of stated 1.6B).
>
>OK, that's not bad at all.
>
In article <[EMAIL PROTECTED]>,
<[EMAIL PROTECTED]> wrote:
>Thanks to all who replied. It's very appreciated.
>
>Yes, I had to doublecheck line counts and the number of lines is ~16
>million (insetead of stated 1.6B).
>
>Also:
>
>>What is a "Unicode text file"? How is it encoded: utf8, utf16, utf1
On Sun, 27 Jan 2008 10:00:45 +, Grant Edwards wrote:
> On 2008-01-27, Stefan Behnel <[EMAIL PROTECTED]> wrote:
>> Gabriel Genellina wrote:
>>> use the Windows sort command. It has been
>>> there since MS-DOS ages, there is no need to download and install other
>>> packages, and the documentati
On 2008-01-27, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> Gabriel Genellina wrote:
>> use the Windows sort command. It has been
>> there since MS-DOS ages, there is no need to download and install other
>> packages, and the documentation at
>> http://technet.microsoft.com/en-us/library/bb491004.asp
Gabriel Genellina wrote:
> use the Windows sort command. It has been
> there since MS-DOS ages, there is no need to download and install other
> packages, and the documentation at
> http://technet.microsoft.com/en-us/library/bb491004.aspx says:
>
> Limits on file size:
> The sort command has no
En Fri, 25 Jan 2008 17:50:17 -0200, Paul Rubin
<"http://phr.cx"@NOSPAM.invalid> escribi�:
> Nicko <[EMAIL PROTECTED]> writes:
>> # The next line is order O(n) in the number of chunks
>> (line, fileindex) = min(mergechunks)
>
> You should use the heapq module to make this operation O(log
Nicko <[EMAIL PROTECTED]> writes:
> # The next line is order O(n) in the number of chunks
> (line, fileindex) = min(mergechunks)
You should use the heapq module to make this operation O(log n) instead.
--
http://mail.python.org/mailman/listinfo/python-list
On Jan 24, 9:26 pm, [EMAIL PROTECTED] wrote:
> > If you really have a 2GB file and only 2GB of RAM, I suggest that you don't
> > hold your breath.
>
> I am limited with resources. Unfortunately.
As long as you have at least as much disc space spare as you need to
hold a copy of the file then this
On Jan 25, 9:23 am, Asim <[EMAIL PROTECTED]> wrote:
> On Jan 24, 4:26 pm, [EMAIL PROTECTED] wrote:
>
>
>
> > Thanks to all who replied. It's very appreciated.
>
> > Yes, I had to doublecheck line counts and the number of lines is ~16
> > million (insetead of stated 1.6B).
>
> > Also:
>
> > >What is
On Jan 24, 4:26 pm, [EMAIL PROTECTED] wrote:
> Thanks to all who replied. It's very appreciated.
>
> Yes, I had to doublecheck line counts and the number of lines is ~16
> million (insetead of stated 1.6B).
>
> Also:
>
> >What is a "Unicode text file"? How is it encoded: utf8, utf16, utf16le,
> >u
John Nagle <[EMAIL PROTECTED]> writes:
> > Unix sort does external sorting when needed.
>
>Ah, someone finally put that in. Good. I hadn't looked at
> "sort"'s manual page in many years.
Huh? It has been like that from the beginning. It HAD to be. Unix
was originally written on a PDP-11.
Paul Rubin wrote:
> John Nagle <[EMAIL PROTECTED]> writes:
>> - Get enough memory to do the sort with an in-memory sort, like
>> UNIX "sort" or Python's "sort" function.
>
> Unix sort does external sorting when needed.
Ah, someone finally put that in. Good. I hadn't looked at "sort"
John Nagle <[EMAIL PROTECTED]> writes:
> - Get enough memory to do the sort with an in-memory sort, like
> UNIX "sort" or Python's "sort" function.
Unix sort does external sorting when needed.
--
http://mail.python.org/mailman/listinfo/python-list
[EMAIL PROTECTED] wrote:
> Thanks to all who replied. It's very appreciated.
>
> Yes, I had to double check line counts and the number of lines is ~16
> million (instead of stated 1.6B).
OK, that's not bad at all.
You have a few options:
- Get enough memory to do the sort with an in
On Thursday 24 January 2008 20:56 John Nagle wrote:
> [EMAIL PROTECTED] wrote:
>> Hello all,
>>
>> I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
>> to sort based on first two characters.
>
> Given those numbers, the average number of characters per line is
> less tha
On Jan 25, 8:26 am, [EMAIL PROTECTED] wrote:
> I need to isolate all lines that start with two characters (zz to be
> particular)
What does "isolate" mean to you? What does this have to do with
sorting? What do you actually want to do with (a) the lines starting
with "zz" (b) the other lines? Wh
Stefan Behnel wrote:
> [EMAIL PROTECTED] wrote:
>>> What are you going to do with it after it's sorted?
>> I need to isolate all lines that start with two characters (zz to be
>> particular)
>
> "Isolate" as in "extract"? Remove the rest?
>
> Then why don't you extract the lines first, without so
[EMAIL PROTECTED] wrote:
>> What are you going to do with it after it's sorted?
> I need to isolate all lines that start with two characters (zz to be
> particular)
"Isolate" as in "extract"? Remove the rest?
Then why don't you extract the lines first, without sorting the file? (or sort
it afterw
Thanks to all who replied. It's very appreciated.
Yes, I had to doublecheck line counts and the number of lines is ~16
million (insetead of stated 1.6B).
Also:
>What is a "Unicode text file"? How is it encoded: utf8, utf16, utf16le,
>utf16be, ??? If you don't know, do this:
The file is UTF-8
>
On Jan 25, 6:18 am, [EMAIL PROTECTED] wrote:
> Hello all,
>
> I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
> to sort based on first two characters.
If you mean 1.6 American billion i.e. 1.6 * 1000 ** 3 lines, and 2 *
1024 ** 3 bytes of data, that's 1.34 bytes per line. If
[EMAIL PROTECTED] wrote:
> Hello all,
>
> I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
> to sort based on first two characters.
Given those numbers, the average number of characters per line is
less than 2. Please check.
John
[EMAIL PROTECTED] writes:
> I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
> to sort based on first two characters.
>
> I'd greatly appreciate if someone can post sample code that can help
> me do this.
Use the unix sort command:
sort inputfile -o outputfile
I think
Hello all,
I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
to sort based on first two characters.
I'd greatly appreciate if someone can post sample code that can help
me do this.
Also, any ideas on approximately how long is the sort process going to
take (XP, Dual Core 2.0
23 matches
Mail list logo