Re: sc.textFile can't recognize '\004'

2014-06-21 Thread anny9699
Thanks a lot Sean! It works now for me now~~



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sc-textFile-can-t-recognize-004-tp8059p8071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


sc.textFile can't recognize '\004'

2014-06-20 Thread anny9699
Hi,

I need to parse a file which is separated by a series of separators. I used
SparkContext.textFile and I met two problems:

1) One of the separators is '\004', which could be recognized by python or R
or Hive, however Spark seems can't recognize this one and returns a symbol
looking like '?'. Also this symbol is not a question mark and I don't know
how to parse.

2) Some of the separator are composed of several Chars, like } =. If I
use str.split(Array('}', '=')), it will separate the string but with many
white spaces included in the middle. Is there a good way that I could
separate by String instead of by Array of Chars? 

Thanks a lot!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sc-textFile-can-t-recognize-004-tp8059.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.