Re: [PD] Fastest way to find lines in text file

cyrille henry Wed, 22 Mar 2017 08:47:15 -0700

if you textfile is composed of 2 row of number you can optimize the search with 
prior treatment.


1 : order the index column (already done in your example)
2 : create 2 table of start index, and number of occurrence of this index
in you example, the "start index table" would be 0 at 345594, 5 at 345595, 15 
at 345596, 16 at 345598
the "number of occurrence index table" would be : 5 at 345594, 10 at 345595, 1 
at 345596, 4 at 345598
3 : put column 2 of you textfile in a "data table"

now, when searching for 345595, you just have to [tabread table1] and [tabread 
table2] at position 345595, and with a small until loop you just have to read 
the data table only where needed.

cheers
c

Le 22/03/2017 à 14:34, Jack a écrit :

I guess my 2 precedent mails were enough clear.
But i will answer at each point :

1) My previous mails :
I need to find every lines of a textfile containing a word.
The textfile has 2.539.592 lines.
Now, i am using [msgfile] from zexy because i can find a line, skip a
line and find again ... until the end of the textfile.
But, i am wondering if there is an other object (in an other library)
faster, specialized in this work ?
...
The textfile has only two "strings" by line.
Here, 20 lines of the textfile :

345594 577427
345594 567267
345594 528911
345594 534435
345594 523087
345595 374384
345595 377303
345595 380544
345595 379911
345595 557020
345595 552396
345595 562487
345595 460842
345595 428449
345595 424095
345596 447676
345598 579883
345598 379495
345598 379039
345598 380328

2) See above
3) See above
4) See above
5) Linux/Ubuntu 16.10/Pd 0.47.1
6) you abuse :)

++

Jack




Le 22/03/2017 à 13:31, Lorenzo Sutton a écrit :

Hi,

On 22/03/2017 13:01, Jack wrote:

I need to find all instances that math to the first row.
It is not possible with [text search] if i am right.


I think you should outline your use case/problem in more detail. This
should be a good practice when asking for support on the Mailing List.

Example:

1) I have a text file where each line contains a two integers separated
by a space (" ") char - such as (possibly paste a part of the file on
pastebin or similar too).
213214 12313
123223 13213

2) My file is [always/at least/circa/ ...] 2,539,592 lines long

3) My algorithm should find all subsequent lines matching the first line
in the file and return [all line numbers for matches / the total count
of matched lines / ...]

3) I want the algorithm to be [as fast as possible / run in under 1
second / run in under 1ms / ... ]

4) I [want to / do not need to] use Pd Vanilla

5) My patch should run on [All platforms / Windows / OSX / Linux / ...]

6) My patch should run [on potentially any machine / on a Raspberry Pi /
on a 1990s 386 machine / on my digital toaster where I have compiled a
custom version of Pd / ... ]

:)

++

Jack



Le 22/03/2017 à 08:27, Liam Goodacre a écrit :

You can also use [text search], although t's not so easy to find more
than the first instance. If you don't mind taking a extra step, you
could give each line a third term, which is the line number. Then you
can use the "> 3" argument for [text search] to find matches s



------------------------------------------------------------------------
*From:* Pd-list <pd-list-boun...@lists.iem.at> on behalf of Jack
<j...@rybn.org>
*Sent:* 21 March 2017 18:14
*To:* pd-list@lists.iem.at
*Subject:* [PD] Fastest way to find lines in text file

Hello,

I need to find every lines of a textfile containing a word.
The textfile has 2.539.592 lines.
Now, i am using [msgfile] from zexy because i can find a line, skip a
line and find again ... until the end of the textfile.
But, i am wondering if there is an other object (in an other library)
faster, specialized in this work ?
Thanx.
++

Jack


_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management ->
https://lists.puredata.info/listinfo/pd-list


_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management ->
https://lists.puredata.info/listinfo/pd-list



_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management ->
https://lists.puredata.info/listinfo/pd-list


_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management ->
https://lists.puredata.info/listinfo/pd-list



_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list


_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list

Re: [PD] Fastest way to find lines in text file

Reply via email to