Job Error:Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@130.1.10.108:23600/)

2015-12-24 Thread donhoff_h
Hi,folks I wrote some spark jobs and these jobs could ran successfully when I ran them one by one. But if I ran them concurrently, for example 12 jobs parallel running, I met the following error. Could anybody tell me what cause this? How to solve it? Many Thanks! Exception in thread "main"

Spark Streaming - print accumulators value every period as logs

2015-12-24 Thread Roberto Coluccio
Hello, I have a batch and a streaming driver using same functions (Scala). I use accumulators (passed to functions constructors) to count stuff. In the batch driver, doing so in the right point of the pipeline, I'm able to retrieve the accumulator value and print it as log4j log. In the

How can I get the column data based on specific column name and then stored these data in array or list ?

2015-12-24 Thread zml张明磊
Hi, I am a new to Scala and Spark and trying to find relative API in DataFrame to solve my problem as title described. However, I just only find this API DataFrame.col(colName : String) : Column which returns an object of Column. Not the content. If only DataFrame support such API which

Re: How to ignore case in dataframe groupby?

2015-12-24 Thread Yanbo Liang
You can use DF.groupBy(upper(col("a"))).agg(sum(col("b"))). DataFrame provide function "upper" to update column to uppercase. 2015-12-24 20:47 GMT+08:00 Eran Witkon : > Use DF.withColumn("upper-code",df("countrycode).toUpper)) > or just run a map function that does the same

Re: Hive error when starting up spark-shell in 1.5.2

2015-12-24 Thread Marco Mistroni
No luck. But two updates: 1. i have downloaded spark-1.4.1 and everything works fine, i dont see any error 2. i have added the following XML file to spark's 1.5.2 conf directory and now i got the following error aused by: java.lang.RuntimeException: The root scratch dir: c:/Users/marco/tmp on

how to debug java.lang.IllegalArgumentException: object is not an instance of declaring class

2015-12-24 Thread Andy Davidson
Hi Any idea how I can debug this problem. I suspect the problem has to do with how I am converting a JavaRDD> to a DataFrame. Is it boxing problem? I tried to use long and double instead of Long and Double when ever possible. Thanks in advance, Happy Holidays. Andy

Re: how to debug java.lang.IllegalArgumentException: object is not an instance of declaring class

2015-12-24 Thread Andy Davidson
Problem must be with how I am converting JavaRDD> to a DataFrame. Any suggestions? Most of my work has been done using pySpark. Tuples are a lot harder to work with in Java. JavaRDD> predictions = idLabeledPoingRDD.map((Tuple2 t2)

Extract compressed JSON withing JSON

2015-12-24 Thread Eran Witkon
Hi, I have a JSON file with the following row format: {"cty":"United Kingdom","gzip":"H4sIAKtWystVslJQcs4rLVHSUUouqQTxQvMyS1JTFLwz89JT8nOB4hnFqSBxj/zS4lSF/DQFl9S83MSibKBMZVExSMbQwNBM19DA2FSpFgDvJUGVUw==","nm":"Edmund lronside","yrs":"1016"} The gzip field is a compressed JSON by

Re: Using Java Function API with Java 8

2015-12-24 Thread Sean Owen
You forgot a return statement in the 'else' clause, which is what the compiler is telling you. There's nothing more to it here. Your function is much simpler however as Function checkHeaders2 = (x -> x.startsWith("npi")||x.startsWith("CPT")); On Thu, Dec 24, 2015 at 1:13 AM,

Re: Spark Streaming 1.5.2+Kafka+Python. Strange reading

2015-12-24 Thread Akhil Das
Would you mind posting the relevant code snippet? Thanks Best Regards On Wed, Dec 23, 2015 at 7:33 PM, Vyacheslav Yanuk wrote: > Hi. > I have very strange situation with direct reading from Kafka. > For example. > I have 1000 messages in Kafka. > After submitting my

Re: How to Parse & flatten JSON object in a text file using Spark into Dataframe

2015-12-24 Thread Eran Witkon
raja! I found the answer to your question! Look at http://stackoverflow.com/questions/34069282/how-to-query-json-data-column-using-spark-dataframes this is what you (and I) was looking for. general idea - you read the list as text where project Details is just a string field and then you build the

Re: error in spark cassandra connector

2015-12-24 Thread Ted Yu
Mind providing a bit more detail ? Release of Spark version of Cassandra connector How job was submitted complete stack trace Thanks On Thu, Dec 24, 2015 at 2:06 AM, Vijay Kandiboyina wrote: > java.lang.NoClassDefFoundError: >

Re: Extract compressed JSON withing JSON

2015-12-24 Thread Eran Witkon
Answered using StackOverflow. if you are looking for the solution: This is the trick: val jsonNested = sqlContext.read.json(jsonUnGzip.map{case Row(cty:String, json:String,nm:String,yrs:String) => s"""{"cty": \"$cty\", "extractedJson": $json , "nm": \"$nm\" , "yrs": \"$yrs\"}"""}) See this link

Re: How to contribute by picking up starter bugs

2015-12-24 Thread Ted Yu
You can send out pull request for the JIRA you're interested in. Start the title of pull request with: [SPARK-XYZ] ... where XYZ is the JIRA number. The pull request would be posted on the JIRA. After pull request is reviewed, tested by QA and merged, the committer would assign your name to the

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread Bryan
Are you using a direct stream consumer, or the older receiver based consumer? If the latter, do the number of partitions you’ve specified for your topic match the number of partitions in the topic on Kafka? That would be an possible cause – as you might receive all data from a given partition

How to ignore case in dataframe groupby?

2015-12-24 Thread Bharathi Raja
Hi, Values in a dataframe column named countrycode are in different cases. Eg: (US, us). groupBy & count gives two rows but the requirement is to ignore case for this operation. 1) Is there a way to ignore case in groupBy? Or 2) Is there a way to update the dataframe column countrycode to

How to contribute by picking up starter bugs

2015-12-24 Thread lokeshkumar
Hi >From the how to contribute page of spark jira project I came to know that I can start by picking up the starter label bugs. But who will assign me these bugs? Or should I just fix them and create a pull request. Will be glad to help the project. -- View this message in context:

error in spark cassandra connector

2015-12-24 Thread Vijay Kandiboyina
java.lang.NoClassDefFoundError: com/datastax/spark/connector/rdd/CassandraTableScanRDD

Newbie Help for spark's not finding native hadoop warning

2015-12-24 Thread Bilinmek Istemiyor
Hello, I have apache spark 1.5.1 installed with the help of this user group. I receive following error when I start pyshell WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Later I have downloaded native binary from

Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread vivek.meghanathan
Hi All, We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform. Our spark streaming job(consumer) not receiving all the messages sent to the specific topic. It receives 1 out of ~50 messages(added log in the job stream and identified). We are not seeing any errors in the

RE: How to Parse & flatten JSON object in a text file using Spark into Dataframe

2015-12-24 Thread Bharathi Raja
Thanks Eran, I'll check the solution. Regards, Raja -Original Message- From: "Eran Witkon" Sent: ‎12/‎24/‎2015 4:07 PM To: "Bharathi Raja" ; "Gokula Krishnan D" Cc: "user@spark.apache.org" Subject:

Re: How to ignore case in dataframe groupby?

2015-12-24 Thread Eran Witkon
Use DF.withColumn("upper-code",df("countrycode).toUpper)) or just run a map function that does the same On Thu, Dec 24, 2015 at 2:05 PM Bharathi Raja wrote: > Hi, > Values in a dataframe column named countrycode are in different cases. Eg: > (US, us). groupBy & count

Re: running lda in spark throws exception

2015-12-24 Thread Li Li
anyone could help? On Wed, Dec 23, 2015 at 1:40 PM, Li Li wrote: > I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2. > it throws exception in line: Matrix topics = ldaModel.topicsMatrix(); > But in yarn job history ui, it's successful. What's wrong with it? >

Re: Newbie Help for spark's not finding native hadoop warning

2015-12-24 Thread Sean Owen
You can safely ignore it. Native libs aren't set with HADOOP_HOME. See Hadoop docs on how to configure this if you're curious, but you really don't need to. On Thu, Dec 24, 2015 at 12:19 PM, Bilinmek Istemiyor wrote: > Hello, > > I have apache spark 1.5.1 installed with the

Re: Newbie Help for spark's not finding native hadoop warning

2015-12-24 Thread Jacek Laskowski
Hi, To add to it, you can read about the native libs in https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html. Pozdrawiam, Jacek Jacek Laskowski | https://medium.com/@jaceklaskowski/ Mastering Apache Spark ==>

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread vivek.meghanathan
We are using the older receiver based approach, the number of partitions is 1 (we have a single node kafka) and we use single thread per topic still we have the problem. Please see the API we use. All 8 spark jobs use same group name – is that a problem? val topicMap =