Re: streaming example has error

2016-06-15 Thread Lee Ho Yeung
a:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.IllegalArgumentException: requirement failed: No output
operations registered, so nothing to execute
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:161)
at
org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:542)
at
org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:601)
at
org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:600)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:46)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:48)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:50)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:52)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:54)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:60)
at $iwC$$iwC$$iwC$$iwC.(:62)
at $iwC$$iwC$$iwC.(:64)
at $iwC$$iwC.(:66)
at $iwC.(:68)
at (:70)
at .(:74)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


scala> ssc.awaitTermination()


On Wed, Jun 15, 2016 at 8:53 PM, David Newberger <
david.newber...@wandcorp.com> wrote:

> Have you tried to “set spark.driver.allowMultipleContexts = true”?
>
>
>
> *David Newberger*
>
>
>
> *From:* Lee Ho Yeung [mailto:jobmatt...@gmail.com]
> *Sent:* Tuesday, June 14, 2016 8:34 PM
> *To:* user@spark.apache.org
> *Subject:* streaming example has error
>
>
>
> when simulate streaming with nc -lk 
>
> got error below,
>
> then i try example,
>
> martin@ubuntu:~/Downloads$
> /home/martin/Downloads/spark-1.6.1/bin/run-example
> streaming.NetworkWordCount localhost 
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 16/06/14 18:33:06 INFO StreamingExamples: Setting log level to [WARN] for
> streaming example. To override add a custom log4j.properties to the
> classpath.
> 16/06/14 18:33:06 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 16

can spark help to prevent memory error for itertools.combinations(initlist, 2) in python script

2016-06-15 Thread Lee Ho Yeung
i write a python script which has itertools.combinations(initlist, 2)

but it got error when number of elements in initlist over 14,000

is it possible to use spark to do this work?

i have seen yatel can do this, is spark and yatel using hard disk as memory?

if so,

which need to change in python code ?


Re: can not show all data for this table

2016-06-15 Thread Lee Ho Yeung
Hi Mich,

i find my problem cause now, i missed setting delimiter which is tab,

but it got error,

and i notice that only libre office and open and read well, even if Excel
in window, it still can not separate in well format

scala> val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").option("delimiter",
"").load("/home/martin/result002.csv")
java.lang.StringIndexOutOfBoundsException: String index out of range: 0


On Wed, Jun 15, 2016 at 12:14 PM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> there may be an issue with data in your csv file. like blank header line
> etc.
>
> sounds like you have an issue there. I normally get rid of blank lines
> before putting csv file in hdfs.
>
> can you actually select from that temp table. like
>
> sql("select TransactionDate, TransactionType, Description, Value, Balance,
> AccountName, AccountNumber from tmp").take(2)
>
> replace those with your column names. they are mapped using case class
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 June 2016 at 03:02, Lee Ho Yeung <jobmatt...@gmail.com> wrote:
>
>> filter also has error
>>
>> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> Spark context available as sc.
>> SQL context available as sqlContext.
>>
>> scala> import org.apache.spark.sql.SQLContext
>> import org.apache.spark.sql.SQLContext
>>
>> scala> val sqlContext = new SQLContext(sc)
>> sqlContext: org.apache.spark.sql.SQLContext =
>> org.apache.spark.sql.SQLContext@3114ea
>>
>> scala> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> Java HotSpot(TM) Client VM warning: You have loaded library
>> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
>> disabled stack guard. The VM will try to fix the stack guard now.
>> It's highly recommended that you fix the library with 'execstack -c
>> ', or link it with '-z noexecstack'.
>> df: org.apache.spark.sql.DataFrame = [a0a1a2a3a4a5
>> a6a7a8a9: string]
>>
>> scala> df.printSchema()
>> root
>>  |-- a0a1a2a3a4a5a6a7a8a9: string
>> (nullable = true)
>>
>>
>> scala> df.registerTempTable("sales")
>>
>> scala> df.filter($"a0".contains("found
>> deep=1")).filter($"a1".contains("found
>> deep=1")).filter($"a2".contains("found deep=1"))
>> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
>> columns: [a0a1a2a3a4a5a6a7a8a9];
>> at
>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>
>>
>>
>>
>>
>> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com>
>> wrote:
>>
>>> after tried following commands, can not show data
>>>
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>>
>>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>>> com.databricks:spark-csv_2.11:1.4.0
>>>
>>> import org.apache.spark.sql.SQLContext
>>>
>>> val sqlContext = new SQLContext(sc)
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> val aggDF = sqlContext.sql("select * from sales where a0 like
>>> \"%deep=3%\"")
>>> df.collect.foreach(println)
>>> aggDF.collect.foreach(println)
>>>
>>>
>>>
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>>
>>
>>
>


Re: can not show all data for this table

2016-06-15 Thread Lee Ho Yeung
Hi Mich,

https://drive.google.com/file/d/0Bxs_ao6uuBDUQ2NfYnhvUl9EZXM/view?usp=sharing
https://drive.google.com/file/d/0Bxs_ao6uuBDUS1UzTWd1Q2VJdEk/view?usp=sharing

this time I ensure headers cover all data, only some columns which have
headers do not have data

but still can not show all data like i open libre office

/home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
com.databricks:spark-csv_2.11:1.4.0
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").load("/home/martin/result002.csv")
df.printSchema()
df.registerTempTable("sales")
df.filter($"a3".contains("found deep=1"))





On Tue, Jun 14, 2016 at 9:14 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> there may be an issue with data in your csv file. like blank header line
> etc.
>
> sounds like you have an issue there. I normally get rid of blank lines
> before putting csv file in hdfs.
>
> can you actually select from that temp table. like
>
> sql("select TransactionDate, TransactionType, Description, Value, Balance,
> AccountName, AccountNumber from tmp").take(2)
>
> replace those with your column names. they are mapped using case class
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 June 2016 at 03:02, Lee Ho Yeung <jobmatt...@gmail.com> wrote:
>
>> filter also has error
>>
>> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> Spark context available as sc.
>> SQL context available as sqlContext.
>>
>> scala> import org.apache.spark.sql.SQLContext
>> import org.apache.spark.sql.SQLContext
>>
>> scala> val sqlContext = new SQLContext(sc)
>> sqlContext: org.apache.spark.sql.SQLContext =
>> org.apache.spark.sql.SQLContext@3114ea
>>
>> scala> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> Java HotSpot(TM) Client VM warning: You have loaded library
>> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
>> disabled stack guard. The VM will try to fix the stack guard now.
>> It's highly recommended that you fix the library with 'execstack -c
>> ', or link it with '-z noexecstack'.
>> df: org.apache.spark.sql.DataFrame = [a0a1a2a3a4a5
>> a6a7a8a9: string]
>>
>> scala> df.printSchema()
>> root
>>  |-- a0a1a2a3a4a5a6a7a8a9: string
>> (nullable = true)
>>
>>
>> scala> df.registerTempTable("sales")
>>
>> scala> df.filter($"a0".contains("found
>> deep=1")).filter($"a1".contains("found
>> deep=1")).filter($"a2".contains("found deep=1"))
>> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
>> columns: [a0a1a2a3a4a5a6a7a8a9];
>> at
>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>
>>
>>
>>
>>
>> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com>
>> wrote:
>>
>>> after tried following commands, can not show data
>>>
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>>
>>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>>> com.databricks:spark-csv_2.11:1.4.0
>>>
>>> import org.apache.spark.sql.SQLContext
>>>
>>> val sqlContext = new SQLContext(sc)
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> val aggDF = sqlContext.sql("select * from sales where a0 like
>>> \"%deep=3%\"")
>>> df.collect.foreach(println)
>>> aggDF.collect.foreach(println)
>>>
>>>
>>>
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>>
>>
>>
>


Re: can not show all data for this table

2016-06-14 Thread Lee Ho Yeung
filter also has error

16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
4040. Attempting port 4041.
Spark context available as sc.
SQL context available as sqlContext.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@3114ea

scala> val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").load("/home/martin/result002.csv")
16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
UseCompressedOops is set; assuming yes
Java HotSpot(TM) Client VM warning: You have loaded library
/tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c
', or link it with '-z noexecstack'.
df: org.apache.spark.sql.DataFrame = [a0a1a2a3a4a5
a6a7a8a9: string]

scala> df.printSchema()
root
 |-- a0a1a2a3a4a5a6a7a8a9: string
(nullable = true)


scala> df.registerTempTable("sales")

scala> df.filter($"a0".contains("found
deep=1")).filter($"a1".contains("found
deep=1")).filter($"a2".contains("found deep=1"))
org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
columns: [a0a1a2a3a4a5a6a7a8    a9    ];
    at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)




On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com> wrote:

> after tried following commands, can not show data
>
>
> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>
> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>
> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
> com.databricks:spark-csv_2.11:1.4.0
>
> import org.apache.spark.sql.SQLContext
>
> val sqlContext = new SQLContext(sc)
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
> df.printSchema()
> df.registerTempTable("sales")
> val aggDF = sqlContext.sql("select * from sales where a0 like
> \"%deep=3%\"")
> df.collect.foreach(println)
> aggDF.collect.foreach(println)
>
>
>
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").load("/home/martin/result002.csv")
> df.printSchema()
> df.registerTempTable("sales")
> sqlContext.sql("select * from sales").take(30).foreach(println)
>


streaming example has error

2016-06-14 Thread Lee Ho Yeung
when simulate streaming with nc -lk 

got error below,

then i try example,

martin@ubuntu:~/Downloads$
/home/martin/Downloads/spark-1.6.1/bin/run-example
streaming.NetworkWordCount localhost 
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
16/06/14 18:33:06 INFO StreamingExamples: Setting log level to [WARN] for
streaming example. To override add a custom log4j.properties to the
classpath.
16/06/14 18:33:06 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
16/06/14 18:33:06 WARN Utils: Your hostname, ubuntu resolves to a loopback
address: 127.0.1.1; using 192.168.157.134 instead (on interface eth0)
16/06/14 18:33:06 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
16/06/14 18:33:13 WARN SizeEstimator: Failed to check whether
UseCompressedOops is set; assuming yes


got error too.

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val conf = new
SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
val lines = ssc.socketTextStream("localhost", )
val words = lines.flatMap(_.split(" "))
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
ssc.start()
ssc.awaitTermination()



scala> import org.apache.spark._
import org.apache.spark._

scala> import org.apache.spark.streaming._
import org.apache.spark.streaming._

scala> import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.StreamingContext._

scala> val conf = new
SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@67bcaf

scala> val ssc = new StreamingContext(conf, Seconds(1))
16/06/14 18:28:44 WARN AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at
org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:252)
at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:262)
at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:262)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1988)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1979)
at
org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:262)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:136)
at
org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:481)
at
org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:481)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkContext.(SparkContext.scala:481)
at
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:874)
at
org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:81)
at
$line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36)
at
$line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
at
$line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at
$line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at
$line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
at
$line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
at
$line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
at $line37.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:57)
at $line37.$read$$iwC$$iwC$$iwC$$iwC.(:59)
at $line37.$read$$iwC$$iwC$$iwC.(:61)
at $line37.$read$$iwC$$iwC.(:63)
at $line37.$read$$iwC.(:65)
at $line37.$read.(:67)
at $line37.$read$.(:71)
at $line37.$read$.()
at $line37.$eval$.(:7)
at $line37.$eval$.()
at $line37.$eval.$print()
at 

can not show all data for this table

2016-06-14 Thread Lee Ho Yeung
after tried following commands, can not show data

https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing

/home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
com.databricks:spark-csv_2.11:1.4.0

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").load("/home/martin/result002.csv")
df.printSchema()
df.registerTempTable("sales")
val aggDF = sqlContext.sql("select * from sales where a0 like \"%deep=3%\"")
df.collect.foreach(println)
aggDF.collect.foreach(println)



val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").load("/home/martin/result002.csv")
df.printSchema()
df.registerTempTable("sales")
sqlContext.sql("select * from sales").take(30).foreach(println)