> In any case the streaming utility seems quit like what would be good for > Sage. What do I have to do to take a file with, say, a lot of random > numbers and then use a Sage script to compute the square of each? Can you > give me a brief description?
Here's a way to perform that task using sage and hadoop-streaming. Note this is simply a mapping example, where each line of input only has one thing done to it and there are no reductions needed later, so I'm using `/bin/cat` to collect the mapped results. Given a file of numbers, each on its own line, like so: 1 2 3 4 5 6 7 8 9 10 Put that file in a directory in your running HDFS: hadoop fs -mkdir sage-test hadoop fs -put random-numbers random-numbers Write an executable sage script, with the path to your sage install corrected for your environment: ~$ cat sage-test #!/home/akm/Applications/sage-from-source/sage-4.7.2/sage # Accept stdin and calculate square of the number on each line import sys for line in sys.stdin: line = int(line.strip()) print str(line)+":"+str(line*line) Run your job: $ hadoop jar hadoop-0.20.203.0/contrib/streaming/hadoop- streaming-0.20.203.0.jar -input sage-test -output sage-output -mapper / home/akm/sage-test -reducer /usr/bin/sort $ hadoop fs -lsh sage-output Found 3 items -rw-r--r-- 1 akm supergroup 0 2012-02-24 10:13 /user/akm/ sage-output/_SUCCESS drwxr-xr-x - akm supergroup 0 2012-02-24 10:13 /user/akm/ sage-output/_logs -rw-r--r-- 1 akm supergroup 59 2012-02-24 10:13 /user/akm/ sage-output/part-00000 $ hadoop fs -cat sage-output/part-00000 10:100 1:1 2:4 3:9 4:16 5:25 6:36 7:49 8:64 9:81 -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org