Re: streaming example has error
a:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute at scala.Predef$.require(Predef.scala:233) at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:161) at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:542) at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:601) at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:600) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:48) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:50) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:52) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:54) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58) at $iwC$$iwC$$iwC$$iwC$$iwC.(:60) at $iwC$$iwC$$iwC$$iwC.(:62) at $iwC$$iwC$$iwC.(:64) at $iwC$$iwC.(:66) at $iwC.(:68) at (:70) at .(:74) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) scala> ssc.awaitTermination() On Wed, Jun 15, 2016 at 8:53 PM, David Newberger < david.newber...@wandcorp.com> wrote: > Have you tried to “set spark.driver.allowMultipleContexts = true”? > > > > *David Newberger* > > > > *From:* Lee Ho Yeung [mailto:jobmatt...@gmail.com] > *Sent:* Tuesday, June 14, 2016 8:34 PM > *To:* user@spark.apache.org > *Subject:* streaming example has error > > > > when simulate streaming with nc -lk > > got error below, > > then i try example, > > martin@ubuntu:~/Downloads$ > /home/martin/Downloads/spark-1.6.1/bin/run-example > streaming.NetworkWordCount localhost > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 16/06/14 18:33:06 INFO StreamingExamples: Setting log level to [WARN] for > streaming example. To override add a custom log4j.properties to the > classpath. > 16/06/14 18:33:06 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16
can spark help to prevent memory error for itertools.combinations(initlist, 2) in python script
i write a python script which has itertools.combinations(initlist, 2) but it got error when number of elements in initlist over 14,000 is it possible to use spark to do this work? i have seen yatel can do this, is spark and yatel using hard disk as memory? if so, which need to change in python code ?
Re: can not show all data for this table
Hi Mich, i find my problem cause now, i missed setting delimiter which is tab, but it got error, and i notice that only libre office and open and read well, even if Excel in window, it still can not separate in well format scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("delimiter", "").load("/home/martin/result002.csv") java.lang.StringIndexOutOfBoundsException: String index out of range: 0 On Wed, Jun 15, 2016 at 12:14 PM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > there may be an issue with data in your csv file. like blank header line > etc. > > sounds like you have an issue there. I normally get rid of blank lines > before putting csv file in hdfs. > > can you actually select from that temp table. like > > sql("select TransactionDate, TransactionType, Description, Value, Balance, > AccountName, AccountNumber from tmp").take(2) > > replace those with your column names. they are mapped using case class > > > HTH > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 15 June 2016 at 03:02, Lee Ho Yeung <jobmatt...@gmail.com> wrote: > >> filter also has error >> >> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port >> 4040. Attempting port 4041. >> Spark context available as sc. >> SQL context available as sqlContext. >> >> scala> import org.apache.spark.sql.SQLContext >> import org.apache.spark.sql.SQLContext >> >> scala> val sqlContext = new SQLContext(sc) >> sqlContext: org.apache.spark.sql.SQLContext = >> org.apache.spark.sql.SQLContext@3114ea >> >> scala> val df = >> sqlContext.read.format("com.databricks.spark.csv").option("header", >> "true").option("inferSchema", "true").load("/home/martin/result002.csv") >> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether >> UseCompressedOops is set; assuming yes >> Java HotSpot(TM) Client VM warning: You have loaded library >> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have >> disabled stack guard. The VM will try to fix the stack guard now. >> It's highly recommended that you fix the library with 'execstack -c >> ', or link it with '-z noexecstack'. >> df: org.apache.spark.sql.DataFrame = [a0a1a2a3a4a5 >> a6a7a8a9: string] >> >> scala> df.printSchema() >> root >> |-- a0a1a2a3a4a5a6a7a8a9: string >> (nullable = true) >> >> >> scala> df.registerTempTable("sales") >> >> scala> df.filter($"a0".contains("found >> deep=1")).filter($"a1".contains("found >> deep=1")).filter($"a2".contains("found deep=1")) >> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input >> columns: [a0a1a2a3a4a5a6a7a8a9]; >> at >> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >> >> >> >> >> >> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com> >> wrote: >> >>> after tried following commands, can not show data >>> >>> >>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing >>> >>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing >>> >>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages >>> com.databricks:spark-csv_2.11:1.4.0 >>> >>> import org.apache.spark.sql.SQLContext >>> >>> val sqlContext = new SQLContext(sc) >>> val df = >>> sqlContext.read.format("com.databricks.spark.csv").option("header", >>> "true").option("inferSchema", "true").load("/home/martin/result002.csv") >>> df.printSchema() >>> df.registerTempTable("sales") >>> val aggDF = sqlContext.sql("select * from sales where a0 like >>> \"%deep=3%\"") >>> df.collect.foreach(println) >>> aggDF.collect.foreach(println) >>> >>> >>> >>> val df = >>> sqlContext.read.format("com.databricks.spark.csv").option("header", >>> "true").load("/home/martin/result002.csv") >>> df.printSchema() >>> df.registerTempTable("sales") >>> sqlContext.sql("select * from sales").take(30).foreach(println) >>> >> >> >
Re: can not show all data for this table
Hi Mich, https://drive.google.com/file/d/0Bxs_ao6uuBDUQ2NfYnhvUl9EZXM/view?usp=sharing https://drive.google.com/file/d/0Bxs_ao6uuBDUS1UzTWd1Q2VJdEk/view?usp=sharing this time I ensure headers cover all data, only some columns which have headers do not have data but still can not show all data like i open libre office /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.4.0 import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/home/martin/result002.csv") df.printSchema() df.registerTempTable("sales") df.filter($"a3".contains("found deep=1")) On Tue, Jun 14, 2016 at 9:14 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > there may be an issue with data in your csv file. like blank header line > etc. > > sounds like you have an issue there. I normally get rid of blank lines > before putting csv file in hdfs. > > can you actually select from that temp table. like > > sql("select TransactionDate, TransactionType, Description, Value, Balance, > AccountName, AccountNumber from tmp").take(2) > > replace those with your column names. they are mapped using case class > > > HTH > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 15 June 2016 at 03:02, Lee Ho Yeung <jobmatt...@gmail.com> wrote: > >> filter also has error >> >> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port >> 4040. Attempting port 4041. >> Spark context available as sc. >> SQL context available as sqlContext. >> >> scala> import org.apache.spark.sql.SQLContext >> import org.apache.spark.sql.SQLContext >> >> scala> val sqlContext = new SQLContext(sc) >> sqlContext: org.apache.spark.sql.SQLContext = >> org.apache.spark.sql.SQLContext@3114ea >> >> scala> val df = >> sqlContext.read.format("com.databricks.spark.csv").option("header", >> "true").option("inferSchema", "true").load("/home/martin/result002.csv") >> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether >> UseCompressedOops is set; assuming yes >> Java HotSpot(TM) Client VM warning: You have loaded library >> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have >> disabled stack guard. The VM will try to fix the stack guard now. >> It's highly recommended that you fix the library with 'execstack -c >> ', or link it with '-z noexecstack'. >> df: org.apache.spark.sql.DataFrame = [a0a1a2a3a4a5 >> a6a7a8a9: string] >> >> scala> df.printSchema() >> root >> |-- a0a1a2a3a4a5a6a7a8a9: string >> (nullable = true) >> >> >> scala> df.registerTempTable("sales") >> >> scala> df.filter($"a0".contains("found >> deep=1")).filter($"a1".contains("found >> deep=1")).filter($"a2".contains("found deep=1")) >> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input >> columns: [a0a1a2a3a4a5a6a7a8a9]; >> at >> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >> >> >> >> >> >> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com> >> wrote: >> >>> after tried following commands, can not show data >>> >>> >>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing >>> >>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing >>> >>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages >>> com.databricks:spark-csv_2.11:1.4.0 >>> >>> import org.apache.spark.sql.SQLContext >>> >>> val sqlContext = new SQLContext(sc) >>> val df = >>> sqlContext.read.format("com.databricks.spark.csv").option("header", >>> "true").option("inferSchema", "true").load("/home/martin/result002.csv") >>> df.printSchema() >>> df.registerTempTable("sales") >>> val aggDF = sqlContext.sql("select * from sales where a0 like >>> \"%deep=3%\"") >>> df.collect.foreach(println) >>> aggDF.collect.foreach(println) >>> >>> >>> >>> val df = >>> sqlContext.read.format("com.databricks.spark.csv").option("header", >>> "true").load("/home/martin/result002.csv") >>> df.printSchema() >>> df.registerTempTable("sales") >>> sqlContext.sql("select * from sales").take(30).foreach(println) >>> >> >> >
Re: can not show all data for this table
filter also has error 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Spark context available as sc. SQL context available as sqlContext. scala> import org.apache.spark.sql.SQLContext import org.apache.spark.sql.SQLContext scala> val sqlContext = new SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@3114ea scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/home/martin/result002.csv") 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes Java HotSpot(TM) Client VM warning: You have loaded library /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'. df: org.apache.spark.sql.DataFrame = [a0a1a2a3a4a5 a6a7a8a9: string] scala> df.printSchema() root |-- a0a1a2a3a4a5a6a7a8a9: string (nullable = true) scala> df.registerTempTable("sales") scala> df.filter($"a0".contains("found deep=1")).filter($"a1".contains("found deep=1")).filter($"a2".contains("found deep=1")) org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input columns: [a0a1a2a3a4a5a6a7a8 a9 ]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com> wrote: > after tried following commands, can not show data > > > https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing > > https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing > > /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages > com.databricks:spark-csv_2.11:1.4.0 > > import org.apache.spark.sql.SQLContext > > val sqlContext = new SQLContext(sc) > val df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load("/home/martin/result002.csv") > df.printSchema() > df.registerTempTable("sales") > val aggDF = sqlContext.sql("select * from sales where a0 like > \"%deep=3%\"") > df.collect.foreach(println) > aggDF.collect.foreach(println) > > > > val df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").load("/home/martin/result002.csv") > df.printSchema() > df.registerTempTable("sales") > sqlContext.sql("select * from sales").take(30).foreach(println) >
streaming example has error
when simulate streaming with nc -lk got error below, then i try example, martin@ubuntu:~/Downloads$ /home/martin/Downloads/spark-1.6.1/bin/run-example streaming.NetworkWordCount localhost Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/06/14 18:33:06 INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath. 16/06/14 18:33:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/06/14 18:33:06 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.157.134 instead (on interface eth0) 16/06/14 18:33:06 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 16/06/14 18:33:13 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes got error too. import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") val ssc = new StreamingContext(conf, Seconds(1)) val lines = ssc.socketTextStream("localhost", ) val words = lines.flatMap(_.split(" ")) val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) ssc.start() ssc.awaitTermination() scala> import org.apache.spark._ import org.apache.spark._ scala> import org.apache.spark.streaming._ import org.apache.spark.streaming._ scala> import org.apache.spark.streaming.StreamingContext._ import org.apache.spark.streaming.StreamingContext._ scala> val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@67bcaf scala> val ssc = new StreamingContext(conf, Seconds(1)) 16/06/14 18:28:44 WARN AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.server.Server.doStart(Server.java:293) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:252) at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:262) at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:262) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1988) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1979) at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:262) at org.apache.spark.ui.WebUI.bind(WebUI.scala:136) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:481) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:481) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.(SparkContext.scala:481) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:874) at org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:81) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55) at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:57) at $line37.$read$$iwC$$iwC$$iwC$$iwC.(:59) at $line37.$read$$iwC$$iwC$$iwC.(:61) at $line37.$read$$iwC$$iwC.(:63) at $line37.$read$$iwC.(:65) at $line37.$read.(:67) at $line37.$read$.(:71) at $line37.$read$.() at $line37.$eval$.(:7) at $line37.$eval$.() at $line37.$eval.$print() at
can not show all data for this table
after tried following commands, can not show data https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.4.0 import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/home/martin/result002.csv") df.printSchema() df.registerTempTable("sales") val aggDF = sqlContext.sql("select * from sales where a0 like \"%deep=3%\"") df.collect.foreach(println) aggDF.collect.foreach(println) val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("/home/martin/result002.csv") df.printSchema() df.registerTempTable("sales") sqlContext.sql("select * from sales").take(30).foreach(println)