[REBOL] Re: Large file compare

Tom Conlin Wed, 08 Jun 2005 15:04:37 -0700

I dont have time to try/test any of this so some of the logic may be revers=
ed,=20
but might a single pass approach help



;;; a b sorted blocks

while[all[not tail? a not tail b]][
=09either equal? first a first b
=09=09[=09insert/only tail in-both first a
=09=09=09a: next a=20
=09=09=09b: next b
=09=09]
=09=09[either greater? first a first b
=09=09=09[insert/only tail only_a first a a: next a]
=09=09=09[insert/only tail only_b first b b: next b]
=09=09]
]
;;; incase one finishes before the other
while[not tail? a][
=09insert/only tail only_a first a=20
=09a: next a
]
while[not tail? b][
=09insert/only tail only_b first b=20
=09b: next b
]



On 6/8/05, Thorsten Moeller <[EMAIL PROTECTED]> wrote:
>=20
> Hi Gabriele,
>=20
> good hints. So, i now use read/line instead of read and write out the
> result from the difference operation immediatly and remove it from
> memory. This drops the actual memory consumption to 140 MB during
> intersect and 120 MB during difference.
>=20
> But i still think it will become too big when operating on the whole
> set.
>=20
> As the file content is very trivial like "2348246864;PCINIT2" and can be
> sorted, i tink of something like stepping through the files line by line
> and compare the line content. The file which have the lead in the
> comparison will be alternating, depending, if there is a difference in
> the first or second column. This only works, when both files are sorted
> identically.
>=20
> There will be a minimum memory consumption. But, what i don't know is,
> what commands to use as they must remember the positions in the files.
>=20
> I will think on this further. Perhaps you have good idea how this could
> be implemented. What i don't know know is, if this will be fast enough.
>=20
> Thanks
>=20
> Thorsten
>=20
>=20
> On Wed, 8 Jun 2005 12:31:56 +0200, "Gabriele Santilli"
> <[EMAIL PROTECTED]> said:
> >
> > Hi Thorsten,
> >
> > On Wednesday, June 8, 2005, 11:53:08 AM, you wrote:
> >
> > TM> a: read %testfile1.txt
> > TM> b: read %testfile2.txt
> >
> > Did you mean READ/LINES?
> >
> > TM> inboth: intersect a b
> > TM> only_a: difference inboth a
> > TM> only_b: difference inboth b
> >
> > TM> My question is, if there are better ways in rebol to achive the sam=
e
> > with
> > TM> lesser memory consumption??
> >
> > Yes - don't load the whole files in memory. :-)
> >
> > Is the difference going to be big too? If so you may want to avoid
> > keeping it in memory too.
> >
> > OTOH, if you have enough memory for the operation, doing it all in
> > memory is going to be much faster.
> >
> > Regards,
> >    Gabriele.
> > --
> > Gabriele Santilli <[EMAIL PROTECTED]>  --  REBOL Programmer
> > Amiga Group Italia sez. L'Aquila  ---   SOON: http://www.rebol.it/
> >
> > --
> > To unsubscribe from the list, just send an email to
> > lists at rebol.com with unsubscribe as the subject.
> >
> >
> --
>   Melian Solutions
>   Thorsten Moeller
>=20
>   Mail: [EMAIL PROTECTED]
>=20
> --
> http://www.fastmail.fm - One of many happy users:
>   http://www.fastmail.fm/docs/quotes.html
>=20
>=20
> --
> Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis
> ++ Jetzt anmelden & testen ++ http://www.gmx.net/de/go/promail ++
> --
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
>=20
>=20


--=20
   ... nice weather   eh
-- 
To unsubscribe from the list, just send an email to 
lists at rebol.com with unsubscribe as the subject.

[REBOL] Re: Large file compare

Reply via email to