Re: spark metrics question

2016-02-05 Thread Matt K
Yes. And what I'm trying to figure out if there's a way to package the jar in such a way that I don't have to install it on every Executor node. On Wed, Feb 3, 2016 at 7:46 PM, Yiannis Gkoufas wrote: > Hi Matt, > > does the custom class you want to package reports metrics

Re: Kafka directsream receiving rate

2016-02-05 Thread Cody Koeninger
If you're using the direct stream, you have 0 receivers. Do you mean you have 1 executor? Can you post the relevant call to createDirectStream from your code, as well as any relevant spark configuration? On Thu, Feb 4, 2016 at 8:13 PM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote:

RE: Can't view executor logs in web UI on Windows

2016-02-05 Thread Mark Pavey
We have created JIRA ticket https://issues.apache.org/jira/browse/SPARK-13142 and will submit a pull request next week. Mark -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: 01 February 2016 14:24 To: Mark Pavey Cc: user@spark.apache.org Subject: Re: Can't view

Re: Driver not able to restart the job automatically after the application of Streaming with Kafka Direct went down

2016-02-05 Thread swetha kasireddy
Following is the error that I see when it retries. org.apache.spark.SparkException: Failed to read checkpoint from directory /share/checkpointDir at org.apache.spark.streaming.CheckpointReader$.read(Checkpoint.scala:342) at

Re: kafkaDirectStream usage error

2016-02-05 Thread Cody Koeninger
2 things: - you're only attempting to read from a single TopicAndPartition. Since your topic has multiple partitions, this probably isn't what you want - you're getting an offset out of range exception because the offset you're asking for doesn't exist in kafka. Use the other

Re: Kafka directsream receiving rate

2016-02-05 Thread Cody Koeninger
How are you counting the number of messages? I'd go ahead and remove the settings for backpressure and maxrateperpartition, just to eliminate that as a variable. On Fri, Feb 5, 2016 at 12:22 PM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > I am using one directsream. Below is

Please Add Our Meetup to the Spark Meetup List

2016-02-05 Thread Timothy Spann
Our meetup is NJ Data Science - Apache Spark http://www.meetup.com/nj-datascience Princeton, NJ Past Meetups: Spark Streaming by Prasad Sripathi, airisDATA

Re: spark metrics question

2016-02-05 Thread Takeshi Yamamuro
How about using `spark.jars` to send jars into a cluster? On Sat, Feb 6, 2016 at 12:00 AM, Matt K wrote: > Yes. And what I'm trying to figure out if there's a way to package the jar > in such a way that I don't have to install it on every Executor node. > > > On Wed, Feb

Re: Too many open files, why changing ulimit not effecting?

2016-02-05 Thread Nirav Patel
For centos there's also /etc/security/limits.d/90-nproc.conf that may need modifications. Services that you expect to use new limits needs to be restarted. Simple thing to do is to reboot the machine. On Fri, Feb 5, 2016 at 3:59 AM, Ted Yu wrote: > bq. and *"session

Spark process failing to receive data from the Kafka queue in yarn-client mode.

2016-02-05 Thread Rachana Srivastava
I am trying to run following code using yarn-client mode in but getting slow readprocessor error mentioned below but the code works just fine in the local mode. Any pointer is really appreciated. Line of code to receive data from the Kafka Queue: JavaPairReceiverInputDStream

Re: How to edit/delete a message posted in Apache Spark User List?

2016-02-05 Thread Luciano Resende
Please see http://www.apache.org/foundation/public-archives.html On Fri, Feb 5, 2016 at 9:35 AM, SRK wrote: > Hi, > > How do I edit/delete a message posted in Apache Spark User List? > > Thanks! > > > > -- > View this message in context: >

Re: Help needed in deleting a message posted in Spark User List

2016-02-05 Thread Marcelo Vanzin
You don't... just send a new one. On Fri, Feb 5, 2016 at 9:33 AM, swetha kasireddy wrote: > Hi, > > I want to edit/delete a message posted in Spark User List. How do I do that? > > Thanks! -- Marcelo

How to edit/delete a message posted in Apache Spark User List?

2016-02-05 Thread SRK
Hi, How do I edit/delete a message posted in Apache Spark User List? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-edit-delete-a-message-posted-in-Apache-Spark-User-List-tp26160.html Sent from the Apache Spark User List mailing list

Help needed in deleting a message posted in Spark User List

2016-02-05 Thread swetha kasireddy
Hi, I want to edit/delete a message posted in Spark User List. How do I do that? Thanks!

What is the best way to JOIN two 10TB csv files and three 100kb files on Spark?

2016-02-05 Thread Rex X
Dear all, The new DataFrame of spark is extremely fast. But out cluster have limited RAM (~500GB). What is the best way to do such a big table Join? Any sample code is greatly welcome! Best, Rex

Re: Kafka directsream receiving rate

2016-02-05 Thread Diwakar Dhanuskodi
I am  using  one  directsream. Below  is  the  call  to directsream:- val topicSet = topics.split(",").toSet val kafkaParams = Map[String,String]("bootstrap.servers" -> "datanode4.isdp.com:9092") val k = KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc, kafkaParams,

Re: Hadoop credentials missing in some tasks?

2016-02-05 Thread Peter Vandenabeele
On Fri, Feb 5, 2016 at 12:58 PM, Gerard Maas wrote: > Hi, > > We're facing a situation where simple queries to parquet files stored in > Swift through a Hive Metastore sometimes fail with this exception: > > org.apache.spark.SparkException: Job aborted due to stage

Re: What is the best way to JOIN two 10TB csv files and three 100kb files on Spark?

2016-02-05 Thread Takeshi Yamamuro
Hi, How about using broadcast joins? largeDf.join(broadcast(smallDf), "joinKey") On Sat, Feb 6, 2016 at 2:25 AM, Rex X wrote: > Dear all, > > The new DataFrame of spark is extremely fast. But out cluster have limited > RAM (~500GB). > > What is the best way to do such a big

Re: pyspark - spark history server

2016-02-05 Thread cs user
Hi Folks, So the fix for me was to copy this file on the nodes built with Ambari: /usr/hdp/2.3.4.0-3485/spark/lib/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar To this file on the client machine, external to the cluster: /opt/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar I

Failed to remove broadcast 2 with removeFromMaster = true in Graphx

2016-02-05 Thread Zhang, Jingyu
I running a Pregel function with 37 nodes in EMR hadoop. After a hour logs show following. Can anyone please tell what the problem is and how do I solve it? Thanks 16/02/05 14:02:46 WARN BlockManagerMaster: Failed to remove broadcast 2 with removeFromMaster = true - Cannot receive any reply in

Re: Spark 1.6.0 HiveContext NPE

2016-02-05 Thread Ted Yu
Was there any other exception(s) in the client log ? Just want to find the cause for this NPE. Thanks On Wed, Feb 3, 2016 at 8:33 AM, Shipper, Jay [USA] wrote: > I’m upgrading an application from Spark 1.4.1 to Spark 1.6.0, and I’m > getting a NullPointerException from

Re: Kafka directsream receiving rate

2016-02-05 Thread Diwakar Dhanuskodi
I am  able  to  see  no of  messages processed  per  event  in  sparkstreaming web UI . Also  I am  counting  the  messages inside  foreachRDD . Removed  the  settings for  backpressure but still  the  same . Sent from Samsung Mobile. Original message From: Cody Koeninger

Re: Unit test with sqlContext

2016-02-05 Thread Steve Annessa
Thanks for all of the responses. I do have an afterAll that stops the sc. While looking over Holden's readme I noticed she mentioned "Make sure to disable parallel execution." That was what I was missing; I added the follow to my build.sbt: ``` parallelExecution in Test := false ``` Now all of

Re: Kafka directsream receiving rate

2016-02-05 Thread Cody Koeninger
Have you tried just printing each message, to see which ones are being processed? On Fri, Feb 5, 2016 at 1:41 PM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > I am able to see no of messages processed per event in > sparkstreaming web UI . Also I am counting the

Shuffle memory woes

2016-02-05 Thread Corey Nolet
I just recently had a discovery that my jobs were taking several hours to completely because of excess shuffle spills. What I found was that when I hit the high point where I didn't have enough memory for the shuffles to store all of their file consolidations at once, it could spill so many times

Re: Using jar bundled log4j.xml on worker nodes

2016-02-05 Thread Matthias Niehoff
mh, that seems to be the problem we are facing. but with —files you can just pass local files, no files in the class path. so we would need a file outside of our jar.. 2016-02-04 18:20 GMT+01:00 Ted Yu : > Have you taken a look at SPARK-11105 ? > > Cheers > > On Thu, Feb 4,

RE: pass one dataframe column value to another dataframe filter expression + Spark 1.5 + scala

2016-02-05 Thread Lohith Samaga M
Hi, If you can also format the condition file as a csv file similar to the main file, then you can join the two dataframes and select only required columns. Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga From: Divya Gehlot

Too many open files, why changing ulimit not effecting?

2016-02-05 Thread Mohamed Nadjib MAMI
Hello all, I'm getting the famous /java.io.FileNotFoundException: ... (Too many open files) /exception. What seemed to have helped people out, it haven't for me. I tried to set the ulimit via the command line /"ulimit -n"/, then I tried to add the following lines to

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

2016-02-05 Thread Udo Fholl
It does not look like. Here is the output of "grep -A2 -i waiting spark_tdump.log" "RMI TCP Connection(idle)" daemon prio=5 tid=156 TIMED_WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) -- "task-result-getter-1" daemon

DenseMatrix update

2016-02-05 Thread Zapper22
There was Update method in Spark 1.3.1 https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/linalg/DenseMatrix.html But in Spark 1.6.0, there is no Update method https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/mllib/linalg/DenseMatrix.html My idea is to store large

Hadoop credentials missing in some tasks?

2016-02-05 Thread Gerard Maas
Hi, We're facing a situation where simple queries to parquet files stored in Swift through a Hive Metastore sometimes fail with this exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 58.0 failed 4 times, most recent failure: Lost task 6.3 in stage 58.0

Re: Too many open files, why changing ulimit not effecting?

2016-02-05 Thread Ted Yu
bq. and *"session required pam_limits.so"*. What was the second file you modified ? Did you make the change on all the nodes ? Please see the verification step in https://easyengine.io/tutorials/linux/increase-open-files-limit/ On Fri, Feb 5, 2016 at 1:42 AM, Mohamed Nadjib MAMI

pyspark - spark history server

2016-02-05 Thread cs user
Hi All, I'm having trouble getting a job to use the spark history server. We have a cluster configured with Ambari, if I run the job from one of the nodes within the Ambari configured cluster, everything works fine, the job appears in the spark history server. If I configure a client external to

Re: Please Add Our Meetup to the Spark Meetup List

2016-02-05 Thread Tushar R Kale
Hi Timothy, It is the Spark Apache admin who adds groups to WW Spark Meetup list. This is correct email id which would address your request :user@spark.apache.org" . Hope this helps. Thank you and best regards, Tushar Kale BIG DATA Evangelist Strategy & Analytics - Big

Re: pass one dataframe column value to another dataframe filter expression + Spark 1.5 + scala

2016-02-05 Thread Ali Tajeldin
I think the tricky part here is that the join condition is encoded in the second data frame and not a direct value. Assuming the second data frame (the tags) is small enough, you can collect it (read it into memory) and then construct a "when" expression chain for each of the possible tags ,

RE: different behavior while using createDataFrame and read.df in SparkR

2016-02-05 Thread Sun, Rui
I guess this is related to https://issues.apache.org/jira/browse/SPARK-11976 When calling createDataFrame on iris, the “.” Character in column names will be replaced with “_”. It seems that when you create a DataFrame from the CSV file, the “.” Character in column names are still there. From: