> In any case the streaming utility seems quit like what would be good for
> Sage. What do I have to do to take a file with, say, a lot of random
> numbers and then use a Sage script to compute the square of each? Can you
> give me a brief description?

Here's a way to perform that task using sage and hadoop-streaming.
Note this is simply a mapping example, where each line of input only
has one thing done to it and there are no reductions needed later, so
I'm using `/bin/cat` to collect the mapped results.

Given a file of numbers, each on its own line, like so:
1
2
3
4
5
6
7
8
9
10

Put that file in a directory in your running HDFS:
hadoop fs -mkdir sage-test
hadoop fs -put random-numbers random-numbers

Write an executable sage script, with the path to your sage install
corrected for your environment:
~$ cat sage-test
#!/home/akm/Applications/sage-from-source/sage-4.7.2/sage

# Accept stdin and calculate square of the number on each line

import sys
for line in sys.stdin:
  line = int(line.strip())
  print str(line)+":"+str(line*line)

Run your job:
$ hadoop jar hadoop-0.20.203.0/contrib/streaming/hadoop-
streaming-0.20.203.0.jar -input sage-test -output sage-output -mapper /
home/akm/sage-test -reducer /usr/bin/sort

$ hadoop fs -lsh sage-output
Found 3 items
-rw-r--r--   1 akm supergroup          0 2012-02-24 10:13 /user/akm/
sage-output/_SUCCESS
drwxr-xr-x   - akm supergroup          0 2012-02-24 10:13 /user/akm/
sage-output/_logs
-rw-r--r--   1 akm supergroup         59 2012-02-24 10:13 /user/akm/
sage-output/part-00000

$ hadoop fs -cat sage-output/part-00000
10:100
1:1
2:4
3:9
4:16
5:25
6:36
7:49
8:64
9:81

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to