Not a Julia solution, but there are some standard UNIX command line tool for this: comm <https://en.wikipedia.org/wiki/Comm> and uniq <https://en.wikipedia.org/wiki/Uniq>. These are how I would go about doing this kind of work, at least initially – it's what they're made for and they're very efficient.
If you want to do this stuff in Julia, however, it's pretty easy too. You'll make extensive use of Sets for this: read a lines from one file and add it to a Set; then read the other file and look it up in the Set to see if it was present. E.g. the following script prints lines present in both arguments: #!/usr/bin/env julia length(ARGS) == 2 || error("two arguments expected") const lines = Set{UTF8String}() open(ARGS[1]) do f for line in eachline(f) push!(lines, chomp(line)) end end open(ARGS[2]) do f for line in eachline(f) line = chomp(line) if line in lines println(line) end end end You can also do these sorts of things in a more library-like way: function lines(io::IO) line_set = Set{UTF8String}() for line in eachline(io) push!(line_set, chomp(line)) end return line_set end lines(path::AbstractString) = open(lines, path) julia> w1 = lines("/usr/share/dict/words") # OS X system dictionary Set(UTF8String["diseaseful","xenyl","Dezaley","ironheartedly","nimbused","ungoverned","tarantass","hatlessness","titration","photosynthesis" … "ponderous","shorten","metroptosia","detractiveness","microbium","boater","navette","tridiametral","notekin","infidelic"]) julia> w2 = lines("/Users/stefan/tmp/words") # dictionary copied from a Linux system Set(UTF8String["rearrangement","pintoes","dial's","inattentive","pewee's","photosynthesis","sleepwalking","caring's","cirrhosis's","entomb" … "affidavit's","boater","deli's","gray's","Concetta","vituperates","overtaxing","graybeard","barrenest","Nevadans"]) julia> length(w1) 235886 julia> length(w2) 99171 julia> w1 ∩ w2 Set(UTF8String["confined","baleful","rearrangement","piecemeal","irreplaceable","shortbread","waster","null","staphylococcus","indelicacy" … "uncut","boater","resell","joint","wavering","munitions","graybeard","treacherous","upsurge","oblique"]) julia> length(w1 ∩ w2) 35077 On a computer with a decent amount of RAM, dealing with data that's hundreds of MB shouldn't be a problem. On Mon, Feb 22, 2016 at 8:13 AM, barbara.g <barbara.gucc...@gmail.com> wrote: > Hi ! > > I have just stepped into Julia, I didn't know about it (her...) before. > > I must handle plain text file, sized many hundreds of MB, and above, each; > the goal is to remove duplicate lines, to find lines present in both of > two, to find lines present in one and not in another, etc. > > I was used to do such operations in Mathematica because it perfectly fitts > my needs when files are small enough to be loaded entirely in RAM, but when > they grow up the jobs become practically infeacible. > > Can Julia provide an (the more or less) out the box solution or, at least, > an easily programmable one ? > > Your sincerely ! >