Command line vector to sequence file
Hi I am looking a simple way in a command line how to convert vector to sequence file. in example I have data.txt file contains vectors. 1,1 2,1 1,2 2,2 3,3 8,8 8,9 9,8 9,9 So is there command line possibility to convert that into sequence file? I tried mahout seqdirectory but after it hdfs dfs -text output2/part-m-0 gives me something like: /data.txt1,1 2,1 1,2 2,2 3,3 8,8 8,9 9,8 9,9 and that is not sequence file format as I understand. I know there are java API but I am looking command line. -- Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" -BEGIN PUBLIC KEY- MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa BjM8j36yJvoBVsfOHQIDAQAB -END PUBLIC KEY-
Re: Command line vector to sequence file
Hi, I did the same search a few weeks back and found that there is nothing in the current API to do that from command line. However I did write a java program that transforms a csv into a SequenceFile which can be used to train a naive bayes (amongst other things). Here are the sources : https://gist.github.com/kmoulart/9616125 You'll find all you need to make a jar with dependecies running and with a proper command line (using JCommander). Both the sequential version and the MapReduce one are in the given files. If you're lazy, I'll put the whole maven project on my github later today. Hope it helps you Kévin Moulart 2014-03-18 9:41 GMT+01:00 Margusja : > Hi > > I am looking a simple way in a command line how to convert vector to > sequence file. > in example I have data.txt file contains vectors. > 1,1 > 2,1 > 1,2 > 2,2 > 3,3 > 8,8 > 8,9 > 9,8 > 9,9 > > So is there command line possibility to convert that into sequence file? > > I tried mahout seqdirectory but after it hdfs dfs -text > output2/part-m-0 gives me something like: > /data.txt1,1 > 2,1 > 1,2 > 2,2 > 3,3 > 8,8 > 8,9 > 9,8 > 9,9 > > and that is not sequence file format as I understand. > > I know there are java API but I am looking command line. > > > -- > Best regards, Margus (Margusja) Roo > +372 51 48 780 > http://margus.roo.ee > http://ee.linkedin.com/in/margusroo > skype: margusja > ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" > -BEGIN PUBLIC KEY- > MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE > 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl > RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa > BjM8j36yJvoBVsfOHQIDAQAB > -END PUBLIC KEY- > >
Re: Command line vector to sequence file
Thank you, I am going to try it. Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" -BEGIN PUBLIC KEY- MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa BjM8j36yJvoBVsfOHQIDAQAB -END PUBLIC KEY- On 18/03/14 10:58, Kevin Moulart wrote: Hi, I did the same search a few weeks back and found that there is nothing in the current API to do that from command line. However I did write a java program that transforms a csv into a SequenceFile which can be used to train a naive bayes (amongst other things). Here are the sources : https://gist.github.com/kmoulart/9616125 You'll find all you need to make a jar with dependecies running and with a proper command line (using JCommander). Both the sequential version and the MapReduce one are in the given files. If you're lazy, I'll put the whole maven project on my github later today. Hope it helps you Kévin Moulart 2014-03-18 9:41 GMT+01:00 Margusja : Hi I am looking a simple way in a command line how to convert vector to sequence file. in example I have data.txt file contains vectors. 1,1 2,1 1,2 2,2 3,3 8,8 8,9 9,8 9,9 So is there command line possibility to convert that into sequence file? I tried mahout seqdirectory but after it hdfs dfs -text output2/part-m-0 gives me something like: /data.txt1,1 2,1 1,2 2,2 3,3 8,8 8,9 9,8 9,9 and that is not sequence file format as I understand. I know there are java API but I am looking command line. -- Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" -BEGIN PUBLIC KEY- MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa BjM8j36yJvoBVsfOHQIDAQAB -END PUBLIC KEY-
Re: Command line vector to sequence file
You're welcome ! Here's the repository if need be : https://github.com/kmoulart/hadoop_mahout_utils Kévin Moulart 2014-03-18 10:00 GMT+01:00 Margusja : > Thank you, I am going to try it. > > > Best regards, Margus (Margusja) Roo > +372 51 48 780 > http://margus.roo.ee > http://ee.linkedin.com/in/margusroo > skype: margusja > ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" > -BEGIN PUBLIC KEY- > MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE > 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl > RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa > BjM8j36yJvoBVsfOHQIDAQAB > -END PUBLIC KEY- > > On 18/03/14 10:58, Kevin Moulart wrote: > >> Hi, >> >> I did the same search a few weeks back and found that there is nothing in >> the current API to do that from command line. >> >> However I did write a java program that transforms a csv into a >> SequenceFile which can be used to train a naive bayes (amongst other >> things). >> >> Here are the sources : >> https://gist.github.com/kmoulart/9616125 >> >> You'll find all you need to make a jar with dependecies running and with a >> proper command line (using JCommander). >> Both the sequential version and the MapReduce one are in the given files. >> >> If you're lazy, I'll put the whole maven project on my github later today. >> >> Hope it helps you >> >> Kévin Moulart >> >> >> 2014-03-18 9:41 GMT+01:00 Margusja : >> >> Hi >>> >>> I am looking a simple way in a command line how to convert vector to >>> sequence file. >>> in example I have data.txt file contains vectors. >>> 1,1 >>> 2,1 >>> 1,2 >>> 2,2 >>> 3,3 >>> 8,8 >>> 8,9 >>> 9,8 >>> 9,9 >>> >>> So is there command line possibility to convert that into sequence file? >>> >>> I tried mahout seqdirectory but after it hdfs dfs -text >>> output2/part-m-0 gives me something like: >>> /data.txt1,1 >>> 2,1 >>> 1,2 >>> 2,2 >>> 3,3 >>> 8,8 >>> 8,9 >>> 9,8 >>> 9,9 >>> >>> and that is not sequence file format as I understand. >>> >>> I know there are java API but I am looking command line. >>> >>> >>> -- >>> Best regards, Margus (Margusja) Roo >>> +372 51 48 780 >>> http://margus.roo.ee >>> http://ee.linkedin.com/in/margusroo >>> skype: margusja >>> ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)" >>> -BEGIN PUBLIC KEY- >>> MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE >>> 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl >>> RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa >>> BjM8j36yJvoBVsfOHQIDAQAB >>> -END PUBLIC KEY- >>> >>> >>> >