Re: Parsing one big multiple line .xml loaded in RDD using Python

2014-10-08 Thread jan.zikes
.   __ Od: Davies Liu dav...@databricks.com Komu: jan.zi...@centrum.cz Datum: 07.10.2014 17:38 Předmět: Re: Parsing one big multiple line .xml loaded in RDD using Python CC: u...@spark.incubator.apache.org Maybe sc.wholeTextFile() is what you want, you

Parsing one big multiple line .xml loaded in RDD using Python

2014-10-07 Thread jan.zikes
Hi, I have already unsucesfully asked quiet simmilar question at stackoverflow, particularly here:  http://stackoverflow.com/questions/26202978/spark-and-python-trying-to-parse-wikipedia-using-gensim. I've also unsucessfully tryied some workaround, but unsucessfuly, workaround problem can be

Re: Parsing one big multiple line .xml loaded in RDD using Python

2014-10-07 Thread Davies Liu
Maybe sc.wholeTextFile() is what you want, you can get the whole text and parse it by yourself. On Tue, Oct 7, 2014 at 1:06 AM, jan.zi...@centrum.cz wrote: Hi, I have already unsucesfully asked quiet simmilar question at stackoverflow, particularly here: