Re: XML Data Source for Spark

2016-04-25 Thread Mohamed ismail
here is an example with code. 
http://stackoverflow.com/questions/33078221/xml-processing-in-spark
I haven't tried. 

On Monday, April 25, 2016 1:06 PM, Jinan Alhajjaj 
 wrote:
 

 Hi All,I am 
trying to use XML data source that is used for parsing and querying XML data 
with Apache Spark, for Spark SQL and data frames.I am using Apache spark 
version 1.6.1 and I am using Java as a programming language. I wrote this 
sample code :SparkConf conf = new 
SparkConf().setAppName("parsing").setMaster("local"); 
 JavaSparkContext sc = new JavaSparkContext(conf);
 SQLContext sqlContext = new SQLContext(sc);
 DataFrame df = 
sqlContext.read().format("com.databricks.spark.xml").option("rowtag", 
"page").load("file.xml");
When I run this code I faced a problem which is Exception in thread "main" 
java.lang.NoSuchMethodError: 
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; at 
com.databricks.spark.xml.XmlOptions.(XmlOptions.scala:26) at 
com.databricks.spark.xml.XmlOptions$.apply(XmlOptions.scala:48) at 
com.databricks.spark.xml.DefaultSource.createRelation(DefaultSource.scala:58) 
at 
com.databricks.spark.xml.DefaultSource.createRelation(DefaultSource.scala:44) 
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) at 
datbricxml.parsing.main(parsing.java:16).Please, I need to solve this error for 
my senior project ASAP.
  

  

NumberFormatException: For input string: "0.00000"

2016-09-19 Thread Mohamed ismail
Hi all

I am trying to read: 

sc.textFile(DataFile).mapPartitions(lines => {
val parser = new CSVParser(",")
lines.map(line=>parseLineToTuple(line, parser))
})
Data looks like:
android 
phone,0,0,0,,0,0,0,0,0,0,0,5,0,0,0,5,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0
ios 
phone,0,-1,0,,0,0,0,0,0,0,1,0,0,0,0,1,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
stage 23055.0 failed 4 times, most recent failure: Lost task 1.3 in stage 
23055.0 (TID 191607, ): 
java.lang.NumberFormatException: For input string: "0.0"

Has anyone faced such issues. Is there a solution?

Thanks,
Mohamed