Re: [gentoo-user] join two tab-separate-value files without join field

Tim Fri, 23 May 2008 22:06:19 -0700

Zhang Weiwu wrote:
> Hi.
> 
> I got a datasheet from my colleague in MS Excel format and I intend to
> process that file with my awk/sed knowledge. The problem is: he sent me
> two Excel files each with 2134 records, in fact there should be only one
> excel file with 2134 rows and 295 columns, but MS Excel can only handle
> 256 data columns, so he split the datasheet vertically so he can manage
> to send to me.
> 
> Now I saved both file to tab-separated-value format, how do I join them?
> 
> I could have used join(1) but that require a join field, an ID of some
> sort. I think of this:
> 
> $  grep -n '' left.tsv | sed 's/:/\t/'> left.forjoin
> $ grep -n '' right.tsv | sed 's/:/\t/'> right.forjoin
> $ join -t "    " left.forjoin right.forjoin > result.tsv
> (note that for join's -t parameter somehow I need to manage to get a tab
> between the quotes)
> 
> Yes I achieved what I want, but that looks complex. Is there a simpler
> way? Thanks in advance.
> 
> I know OpenOffice 3.0 can handle up to 1024 data columns. It's difficult
> to convince anyone to switch to OOO because here in China MS Office
> costs only 0$. I also could use OOO3.0 for doing the join but I wish to
> know the commandline way:)
> 
Got perl?


#!/usr/bin/perl

if($#ARGV < 1) {
        print "Arguments: <file1> <file2>\n";
        exit(1);
}

open(FIRSTFILE, $ARGV[0]);
open(SECONDFILE, $ARGV[1]);
@first = <FIRSTFILE>;
@second = <SECONDFILE>;

$i = 0;
for($i = 0;$i < 2; $i++) {
        $tmp1 = $first[$i];
        $tmp1 =~ s/\n//g;
        $tmp2 = $second[$i];
        $tmp2 =~ s/\n//g;

        $str = $tmp1 . "\t" . $tmp2 . "\n";
        print $str;
}

close(FIRSTFILE);
close(SECONDFILE);

This is likely not the best or fastest way to do it, and I don't have a
dataset as large as yours readily available for testing, but it seems to
work.

-Tim
-- 
gentoo-user@lists.gentoo.org mailing list

Re: [gentoo-user] join two tab-separate-value files without join field

Reply via email to