Re: spark-submit stuck and no output in console

2015-11-17 Thread Kayode Odeyemi
Sonal, SparkPi couldn't run as well. Stuck to the screen with no output hadoop-user@yks-hadoop-m01:/usr/local/spark$ ./bin/run-example SparkPi On Tue, Nov 17, 2015 at 12:22 PM, Steve Loughran wrote: > 48 hours is one of those kerberos warning times (as is 24h, 72h and 7

Working with RDD from Java

2015-11-17 Thread frula00
Hi, I'm working in Java, with Spark 1.3.1 - I am trying to extract data from the RDD returned by org.apache.spark.mllib.clustering.DistributedLDAModel.topicDistributions() (return type is RDD>). How do I work with it from within Java, I can't seem to cast it to JavaPairRDD

Count of streams processed

2015-11-17 Thread Chandra Mohan, Ananda Vel Murugan
HI, Is it possible to have a running count of number of kafka messages processed in a spark streaming application? Thanks Regards, Anand.C

Re: large, dense matrix multiplication

2015-11-17 Thread Eilidh Troup
Hi Burak, That’s interesting. I’ll try and give it a go. Eilidh On 14 Nov 2015, at 04:19, Burak Yavuz wrote: > Hi, > > The BlockMatrix multiplication should be much more efficient on the current > master (and will be available with Spark 1.6). Could you please give that a

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread Ted Yu
I am a bit curious: Hbase depends on hdfs. Has hdfs support for Mesos been fully implemented ? Last time I checked, there was still work to be done. Thanks > On Nov 17, 2015, at 1:06 AM, 임정택 wrote: > > Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of

Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Iulian Dragoș
Hi Jo, I agree that there's something fishy with the cluster dispatcher, I've seen some issues like that. I think it actually tries to send all properties as part of `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:

Re: Conf Settings in Mesos

2015-11-17 Thread Iulian Dragoș
Hi John, I don't think this is specific to Mesos. Note that `spark-defaults.conf` are only defaults. Normally you'd pass your specific options using `--conf`. Does that work? iulian On Thu, Nov 12, 2015 at 3:05 PM, John Omernik wrote: > Hey all, > > I noticed today that if

Re: spark-submit stuck and no output in console

2015-11-17 Thread Kayode Odeyemi
Our hadoop NFS Gateway seems to be malfunctioning. I basically restart it. Now spark jobs have resumed successfully. Problem solved.

Re: Parallelizing operations using Spark

2015-11-17 Thread PhuDuc Nguyen
You should try passing your solr writer into rdd.foreachPartition() for max parallelism - each partition on each executor will execute the function passed in. HTH, Duc On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar wrote: > Any input/suggestions on parallelizing below

Re: ISDATE Function

2015-11-17 Thread Ted Yu
ISDATE() is currently not supported. Since it is SQL Server specific, I guess it wouldn't be added to Spark. On Mon, Nov 16, 2015 at 10:46 PM, Ravisankar Mani wrote: > Hi Everyone, > > > In MSSQL server suppprt "ISDATE()" function is used to fine current > column values date

Re: Spark Job is getting killed after certain hours

2015-11-17 Thread Nikhil Gs
Hello Everyone, Firstly, thank you so much for the response. In our cluster, we are using Spark 1.3.0 and our cluster version is CDH 5.4.1. Yes, we are also using Kerbros in our cluster and the kerberos version is 1.10.3. The error "*GSS initiate failed [Caused by GSSException: No valid

Re: Count of streams processed

2015-11-17 Thread Cody Koeninger
Sure, just call count on each rdd and track it in your driver however you want. If count is called directly on a kafkardd (e.g. createDirectStream, then foreachRDD before doing any other transformations), it should just be using the beginning and ending offsets rather than doing any real work.

Re: Parallelizing operations using Spark

2015-11-17 Thread Susheel Kumar
Any input/suggestions on parallelizing below operations using Spark over Java Thread pooling - reading of 100 thousands json files from local file system - processing each file content and submitting to Solr as Input document Thanks, Susheel On Mon, Nov 16, 2015 at 5:44 PM, Susheel Kumar

Re: synchronizing streams of different kafka topics

2015-11-17 Thread Cody Koeninger
Are you using the direct stream? Each batch should contain all of the unprocessed messages for each topic, unless you're doing some kind of rate limiting. On Tue, Nov 17, 2015 at 3:07 AM, Antony Mayi wrote: > Hi, > > I have two streams coming from two different

Re: How to create nested structure from RDD

2015-11-17 Thread Dean Wampler
Crap. Hit send accidentally... In pseudocode, assuming comma-separated input data: scala> case class Address(street: String, city: String) scala> case class User (name: String, address: Address) scala> val df = sc.textFile("/path/to/stuff"). map { line => val array = line.split(",") //

Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
Hi Thanks I am new in spark development so can you provide some help to write a custom partitioner to achieve this. if you have and link or example to write custom partitioner please provide to me. On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote: >

How to create nested structure from RDD

2015-11-17 Thread fhussonnois
Hi, I need to convert an rdd of RDD[User] to a DataFrame containing a single column named "user". The column "user" should be a nested struct with all User properties. How can I implement this efficiently ? Thank you in advance -- View this message in context:

Re: How to create nested structure from RDD

2015-11-17 Thread Dean Wampler
One way to do it, in the Scala API, you would use a tuple or case class with nested tuples or case classes and/or primitives. It works fine if you convert to a DataFrame, too; you can reference nested elements using dot notation. I think in Python it would similarly. In pseudocode, assuming

Re: how can evenly distribute my records in all partition

2015-11-17 Thread Ted Yu
Please take a look at the following for example: ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala ./core/src/main/scala/org/apache/spark/Partitioner.scala Cheers On Tue, Nov 17, 2015 at 9:24 AM, prateek arora wrote: > Hi > Thanks > I am new

Re: Spark Job is getting killed after certain hours

2015-11-17 Thread Steve Loughran
On 17 Nov 2015, at 15:39, Nikhil Gs > wrote: Hello Everyone, Firstly, thank you so much for the response. In our cluster, we are using Spark 1.3.0 and our cluster version is CDH 5.4.1. Yes, we are also using Kerbros in our cluster

Distinct on key-value pair of JavaRDD

2015-11-17 Thread Ramkumar V
Hi, I have JavaRDD. I would like to do distinct only on key but the normal distinct applies on both key and value. i want to apply only on key. How to do that ? Any help is appreciated. *Thanks*,

Re: spark-submit stuck and no output in console

2015-11-17 Thread Sonal Goyal
I would suggest a couple of things to try A. Try running the example program with master as local[*]. See if spark can run locally or not. B. Check spark master and worker logs. C. Check if normal hadoop jobs can be run properly on the cluster. D. Check spark master webui and see health of

Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Jo Voordeckers
On Tue, Nov 17, 2015 at 5:16 AM, Iulian Dragoș wrote: > I think it actually tries to send all properties as part of > `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed: > > >

Issue while Spark Job fetching data from Cassandra DB

2015-11-17 Thread satish chandra j
HI All, I am getting "*.UnauthorizedException: User has no SELECT permission on or any of its parents*" error while Spark job is fetching data from Cassandra but could able to save data into Cassandra with out any issues Note: With the same user , I could able to access and query the table in

Re: [SPARK STREAMING] Questions regarding foreachPartition

2015-11-17 Thread Nipun Arora
Thanks Cody, that's what I thought. Currently in the cases where I want global ordering, I am doing a collect() call and going through everything in the client. I wonder if there is a way to do a global ordered execution across micro-batches in a betterway? I am having some trouble with

Re: WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson
On my master grep native /root/spark/conf/spark-env.sh SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/l ib/native/" $ ls /root/ephemeral-hdfs/lib/native/ libhadoop.a libhadoop.solibhadooputils.a libsnappy.so libsnappy.so.1.1.3 Linux-i386-32

pyspark ML pipeline with shared data

2015-11-17 Thread Dominik Dahlem
Hi all, I'm working on a pipeline for collaborative filtering. Taking the movielens example, I have a data frame with the columns 'userID', 'movieID', and 'rating'. I would like to transform the ratings before calling ALS and denormalise after. I implemented two transformers to do this, but I'm

Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
Hi I am trying to implement custom partitioner using this link http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where ( in link example key value is from 0 to (noOfElement - 1)) but not able to understand how i implement

Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Jo Voordeckers
Hi Tim, I've done more forensics on this bug, see my comment here: https://issues.apache.org/jira/browse/SPARK-11327?focusedCommentId=15009843=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15009843 - Jo Voordeckers On Tue, Nov 17, 2015 at 12:01 PM, Timothy Chen

Re: Working with RDD from Java

2015-11-17 Thread Sabarish Sasidharan
You can also do rdd.toJavaRDD(). Pls check the API docs Regards Sab On 18-Nov-2015 3:12 am, "Bryan Cutler" wrote: > Hi Ivan, > > Since Spark 1.4.1 there is a Java-friendly function in LDAModel to get the > topic distributions called javaTopicDistributions() that returns a >

RE: Any way to get raw score from MultilayerPerceptronClassificationModel ?

2015-11-17 Thread Ulanov, Alexander
Hi Robert, Raw scores are not available through the public API. It would be great to add this feature, it seems that we overlooked it. The simple way to access the raw predictions currently would be to create a wrapper for mlpModel. This wrapper should be defined in [ml] package. One need to

Re: Any way to get raw score from MultilayerPerceptronClassificationModel ?

2015-11-17 Thread Robert Dodier
On Tue, Nov 17, 2015 at 2:36 PM, Ulanov, Alexander wrote: > Raw scores are not available through the public API. > It would be great to add this feature, it seems that we overlooked it. OK, thanks for the info. > The better way would be to write a new implementation

Spark LogisticRegression returns scaled coefficients

2015-11-17 Thread njoshi
I am testing the LogisticRegression performance on a synthetically generated data. The weights I have as input are w = [2, 3, 4] with no intercept and three features. After training on 1000 synthetically generated datapoint assuming random normal distribution for each, the Spark

Is there a way to delete task history besides using a ttl?

2015-11-17 Thread Jonathan Coveney
so I have the following... broadcast some stuff cache an rdd do a bunch of stuff, eventually calling actions which reduce it to an acceptable size I'm getting an OOM on the driver (well, GC is getting out of control), largely because I have a lot of partitions and it looks like the job history

Re: kafka streaminf 1.5.2 - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2015-11-17 Thread Tathagata Das
Are you creating a fat assembly jar with spark-streaming-kafka included and using that to run your code? If yes, I am not sure why it is not finding it. If not, then make sure that your framework places the spark-stremaing-kafka jara in the runtime classpath. On Tue, Nov 17, 2015 at 6:04 PM,

Spark build error

2015-11-17 Thread 金国栋
Hi! I tried to build spark source code from github, and I successfully built it from command line using `*sbt/sbt assembly*`. While I encountered an error when compiling the project in Intellij IDEA(V14.1.5). The error log is below: *Error:scala: * * while compiling:

RE: TightVNC - Application Monitor (right pane)

2015-11-17 Thread Tim Barthram
Hi, I have a spark kafka streaming application that works when I run with a local spark context, but not with a remote one. My code consists of: 1. A spring-boot application that creates the context 2. A shaded jar file containing all of my spark code On my pc (windows), I have a

Re: Spark LogisticRegression returns scaled coefficients

2015-11-17 Thread DB Tsai
How do you compute the probability given the weights? Also, given a probability, you need to sample positive and negative based on the probability, and how do you do this? I'm pretty sure that the LoR will give you correct weights, and please see the generateMultinomialLogisticInput in

Re: Streaming Job gives error after changing to version 1.5.2

2015-11-17 Thread Tathagata Das
Are you running 1.5.2-compiled jar on a Spark 1.5.2 cluster? On Tue, Nov 17, 2015 at 5:34 PM, swetha wrote: > > > Hi, > > I see java.lang.NoClassDefFoundError after changing the Streaming job > version to 1.5.2. Any idea as to why this is happening? Following are my

Streaming Job gives error after changing to version 1.5.2

2015-11-17 Thread swetha
Hi, I see java.lang.NoClassDefFoundError after changing the Streaming job version to 1.5.2. Any idea as to why this is happening? Following are my dependencies and the error that I get. org.apache.spark spark-core_2.10 ${sparkVersion}

kafka streaminf 1.5.2 - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2015-11-17 Thread tim_b123
Hi, I have a spark kafka streaming application that works when I run with a local spark context, but not with a remote one. My code consists of: 1. A spring-boot application that creates the context 2. A shaded jar file containing all of my spark code On my pc (windows), I have a

Re: Calculating Timeseries Aggregation

2015-11-17 Thread Tathagata Das
For this sort of long term aggregations you should use a dedicated data storage systems. Like a database, or a key-value store. Spark Streaming would just aggregate and push the necessary data to the data store. TD On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta wrote:

Re: spark with breeze error of NoClassDefFoundError

2015-11-17 Thread Ted Yu
Looking in local maven repo, breeze_2.10-0.7.jar contains DefaultArrayValue : jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep !$ jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep DefaultArrayValue 369 Wed Mar

Additional Master daemon classpath

2015-11-17 Thread Michal Klos
Hi, We are running a Spark Standalone cluster on EMR (note: not using YARN) and are trying to use S3 w/ EmrFS as our event logging directory. We are having difficulties with a ClassNotFoundException on EmrFileSystem when we navigate to the event log screen. This is to be expected as the EmrFs

Re: How to properly read the first number lines of file into a RDD

2015-11-17 Thread Zhiliang Zhu
Thanks a lot for your reply.I have also worked it out by some other ways. In fact, firstly I was thinking about using filter to do it but failed.  On Monday, November 9, 2015 9:52 PM, Akhil Das wrote: ​There's multiple way to achieve this: 1. Read the N

Re: Spark LogisticRegression returns scaled coefficients

2015-11-17 Thread Nikhil Joshi
Hi, Wonderful. I was sampling the output, but with a bug. Your comment brought the realization :). I was indeed victimized by the complete separability issue :). Thanks a lot. with regards, Nikhil On Tue, Nov 17, 2015 at 5:26 PM, DB Tsai wrote: > How do you compute the

Re: Streaming Job gives error after changing to version 1.5.2

2015-11-17 Thread swetha kasireddy
This error I see locally. On Tue, Nov 17, 2015 at 5:44 PM, Tathagata Das wrote: > Are you running 1.5.2-compiled jar on a Spark 1.5.2 cluster? > > On Tue, Nov 17, 2015 at 5:34 PM, swetha wrote: > >> >> >> Hi, >> >> I see

Re: Issue while Spark Job fetching data from Cassandra DB

2015-11-17 Thread satish chandra j
HI All, I am getting "*.UnauthorizedException: User has no SELECT permission on or any of its parents*" error while Spark job is fetching data from Cassandra but could able to save data into Cassandra with out any issues Note: With the same user , I could able to access and query the table in

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Ted, Could you elaborate, please? I maintain separated HBase cluster and Mesos cluster for some reasons, and I just can make it work via spark-submit or spark-shell / zeppelin with newly initialized SparkContext. Thanks, Jungtaek Lim (HeartSaVioR) 2015-11-17 22:17 GMT+09:00 Ted Yu

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread Ted Yu
I see - your HBase cluster is separate from Mesos cluster. I somehow got (incorrect) impression that HBase cluster runs on Mesos. On Tue, Nov 17, 2015 at 7:53 PM, 임정택 wrote: > Ted, > > Could you elaborate, please? > > I maintain separated HBase cluster and Mesos cluster for

Incorrect results with reduceByKey

2015-11-17 Thread tovbinm
Howdy, We've noticed a strange behavior with Avro serialized data and reduceByKey RDD functionality. Please see below: // We're reading a bunch of Avro serialized data val data: RDD[T] = sparkContext.hadoopFile(path, classOf[AvroInputFormat[T]], classOf[AvroWrapper[T]], classOf[NullWritable])

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Ted, Thanks for the reply. My fat jar has dependency with spark related library to only spark-core as "provided". Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in spark-example module. And if there're two hbase-default.xml in the classpath, should one of them be loaded, instead of

Re: Spark build error

2015-11-17 Thread Ted Yu
Is the Scala version in Intellij the same as the one used by sbt ? Cheers On Tue, Nov 17, 2015 at 6:45 PM, 金国栋 wrote: > Hi! > > I tried to build spark source code from github, and I successfully built > it from command line using `*sbt/sbt assembly*`. While I encountered an >

Re: Spark build error

2015-11-17 Thread Jeff Zhang
This also bother me for a long time. I suspect the intellij builder conflicts with the sbt/maven builder. I resolve this issue by rebuild spark in intellij. You may meet compilation issue when building it in intellij. For that you need to put external/flume-sink/target/java on the source build

Re: Issue while Spark Job fetching data from Cassandra DB

2015-11-17 Thread Ted Yu
Have you considered polling Cassandra mailing list ? A brief search led to CASSANDRA-7894 FYI On Tue, Nov 17, 2015 at 7:24 PM, satish chandra j wrote: > HI All, > I am getting "*.UnauthorizedException: User has no SELECT > permission on or any of its parents*"

Re: how can evenly distribute my records in all partition

2015-11-17 Thread Sonal Goyal
Think about how you want to distribute your data and how your keys are spread currently. Do you want to compute something per day, per week etc. Based on that, return a partition number. You could use mod 30 or some such function to get the partitions. On Nov 18, 2015 5:17 AM, "prateek arora"

Re: spark-submit stuck and no output in console

2015-11-17 Thread Kayode Odeyemi
Anyone experienced this issue as well? On Mon, Nov 16, 2015 at 8:06 PM, Kayode Odeyemi wrote: > > Or are you saying that the Java process never even starts? > > > Exactly. > > Here's what I got back from jstack as expected: > > hadoop-user@yks-hadoop-m01:/usr/local/spark/bin$

synchronizing streams of different kafka topics

2015-11-17 Thread Antony Mayi
Hi, I have two streams coming from two different kafka topics. the two topics contain time related events but are quite asymmetric in volume. I would obviously need to process them in sync to get the time related events together but with same processing rate if the heavier stream starts

Re: spark-submit stuck and no output in console

2015-11-17 Thread Sonal Goyal
How did the example spark jobs go? SparkPI etc..? Best Regards, Sonal Founder, Nube Technologies Reifier at Strata Hadoop World Reifier at Spark Summit 2015

Off-heap memory usage of Spark Executors keeps increasing

2015-11-17 Thread b.schopman
Hi, The off-heap memory usage of the 3 Spark executor processes keeps increasing constantly until the boundaries of the physical RAM are hit. This happened two weeks ago, at which point the system comes to a grinding halt, because it's unable to spawn new processes. At such a moment restarting

Re: thought experiment: use spark ML to real time prediction

2015-11-17 Thread Nick Pentreath
I think the issue with pulling in all of spark-core is often with dependencies (and versions) conflicting with the web framework (or Akka in many cases). Plus it really is quite heavy if you just want a fairly lightweight model-serving app. For example we've built a fairly simple but scalable ALS

Re: Distributing Python code packaged as tar balls

2015-11-17 Thread Praveen Chundi
Thank you for the reply. I am using zip files for now. Documentation for 1.5.2 mentions use of zip files or eggs, maybe a 'note' that tar's are not supported might be helpful to some. https://spark.apache.org/docs/latest/submitting-applications.html Best Regards, Praveen On 14.11.2015 00:40,

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Oh, one thing I missed is, I built Spark 1.4.1 Cluster with 6 nodes of Mesos 0.22.1 H/A (via ZK) cluster. 2015-11-17 18:01 GMT+09:00 임정택 : > Hi all, > > I'm evaluating zeppelin to run driver which interacts with HBase. > I use fat jar to include HBase dependencies, and see

Re: YARN Labels

2015-11-17 Thread Steve Loughran
One of our clusters runs on AWS with a portion of the nodes being spot nodes. We would like to force the application master not to run on spot nodes. For what ever reason, application master is not able to recover in cases the node where it was running suddenly disappears, which is the case

Re: Spark Job is getting killed after certain hours

2015-11-17 Thread Steve Loughran
On 17 Nov 2015, at 02:00, Nikhil Gs > wrote: Hello Team, Below is the error which we are facing in our cluster after 14 hours of starting the spark submit job. Not able to understand the issue and why its facing the below error

zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
Hi all, I'm evaluating zeppelin to run driver which interacts with HBase. I use fat jar to include HBase dependencies, and see failures on executor level. I thought it is zeppelin's issue, but it fails on spark-shell, too. I loaded fat jar via --jars option, > ./bin/spark-shell --jars

Re: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
I just make it work from both side (zeppelin, spark-shell) via initializing another SparkContext and run. But since it feels me as a workaround, so I'd love to get proper ways (or more beautiful workarounds) to resolve this. Please let me know if you have any suggestions. Best, Jungtaek Lim

Re: spark-submit stuck and no output in console

2015-11-17 Thread Kayode Odeyemi
Thanks for the reply Sonal. I'm on JDK 7 (/usr/lib/jvm/java-7-oracle) My env is a YARN cluster made of 7 nodes (6 datanodes/ node manager, 1 namenode/resource manager). On the namenode, is where I executed the spark-submit job while on one of the datanodes, I executed 'hadoop fs -put /binstore

Re: spark-submit stuck and no output in console

2015-11-17 Thread Sonal Goyal
Could it be jdk related ? Which version are you on? Best Regards, Sonal Founder, Nube Technologies Reifier at Strata Hadoop World Reifier at Spark Summit 2015

Re: spark-submit stuck and no output in console

2015-11-17 Thread Steve Loughran
On 17 Nov 2015, at 09:54, Kayode Odeyemi > wrote: Initially, I submitted 2 jobs to the YARN cluster which was running for 2 days and suddenly stops. Nothing in the logs shows the root cause. 48 hours is one of those kerberos warning times (as is

How to return a pair RDD from an RDD that has foreachPartition applied?

2015-11-17 Thread swetha
Hi, How to return an RDD of key/value pairs from an RDD that has foreachPartition applied. I have my code something like the following. It looks like an RDD that has foreachPartition can have only the return type as Unit. How do I apply foreachPartition and do a save and at the same return a pair

Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Timothy Chen
Hi Jo, Thanks for the links, I would expected the properties to be in scheduler properties but I need to double check. I'll be looking into these problems this week. Tim On Tue, Nov 17, 2015 at 10:28 AM, Jo Voordeckers wrote: > On Tue, Nov 17, 2015 at 5:16 AM, Iulian

Re: Is there a way to delete task history besides using a ttl?

2015-11-17 Thread Jonathan Coveney
reading the code, is there any reason why setting spark.cleaner.ttl.MAP_OUTPUT_TRACKER directly won't get picked up? 2015-11-17 14:45 GMT-05:00 Jonathan Coveney : > so I have the following... > > broadcast some stuff > cache an rdd > do a bunch of stuff, eventually calling

Re: thought experiment: use spark ML to real time prediction

2015-11-17 Thread DB Tsai
I was thinking about to work on better version of PMML, JMML in JSON, but as you said, this requires a dedicated team to define the standard which will be a huge work. However, option b) and c) still don't address the distributed models issue. In fact, most of the models in production have to be

Re: WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson
I forgot to mention. I am using spark-1.5.1-bin-hadoop2.6 From: Andrew Davidson Date: Tuesday, November 17, 2015 at 2:26 PM To: "user @spark" Subject: Re: WARN LoadSnappy: Snappy native library not loaded > FYI > > After 17 min. only

Any way to get raw score from MultilayerPerceptronClassificationModel ?

2015-11-17 Thread Robert Dodier
Hi, I'd like to get the raw prediction score from a MultilayerPerceptronClassificationModel. It appears that the 'predict' method only returns the argmax of the largest score in the output layer (line 200 in MultilayerPerceptronClassificationModel.scala in Spark 1.5.2). Is there any way to get

Re: Working with RDD from Java

2015-11-17 Thread Bryan Cutler
Hi Ivan, Since Spark 1.4.1 there is a Java-friendly function in LDAModel to get the topic distributions called javaTopicDistributions() that returns a JavaPairRDD. If you aren't able to upgrade, you can check out the conversion used here

Invocation of StreamingContext.stop() hangs in 1.5

2015-11-17 Thread jiten
Hi, We're using Spark 1.5 streaming. We've a use case where we need to stop an existing StreamingContext and start a new one primarily to handle a newly added partition to Kafka topic by creating a new Kafka DStream in the context of the new StreamingContext. We've implemented

Re: WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson
FYI After 17 min. only 26112/228155 have succeeded This seems very slow Kind regards Andy From: Andrew Davidson Date: Tuesday, November 17, 2015 at 2:22 PM To: "user @spark" Subject: WARN LoadSnappy: Snappy native library not

Re: Invocation of StreamingContext.stop() hangs in 1.5

2015-11-17 Thread Ted Yu
I don't think you should call ssc.stop() in StreamingListenerBus thread. Please stop the context asynchronously. BTW I have a pending PR: https://github.com/apache/spark/pull/9741 On Tue, Nov 17, 2015 at 1:50 PM, jiten wrote: > Hi, > > We're using Spark 1.5 streaming.

WARN LoadSnappy: Snappy native library not loaded

2015-11-17 Thread Andy Davidson
I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I have 3 slaves. In general I am running into trouble even with small work loads. I am using IPython notebooks running on my spark cluster. Everything is painfully slow. I am using the standAlone cluster manager. I noticed that

RE: spark with breeze error of NoClassDefFoundError

2015-11-17 Thread Jack Yang
So weird. Is there anything wrong with the way I made the pom file (I labelled them as provided)? Is there missing jar I forget to add in “--jar”? See the trace below: Exception in thread "main" java.lang.NoClassDefFoundError: breeze/storage/DefaultArrayValue at

Re: Additional Master daemon classpath

2015-11-17 Thread memorypr...@gmail.com
Have you tried using spark.driver.extraClassPath and spark.executor.extraClassPath ? AFAICT these config options replace SPARK_CLASSPATH. Further info in the docs. I've had good luck with these options, and for ease of use I just set them in the spark defaults config.

Re: Streaming Job gives error after changing to version 1.5.2

2015-11-17 Thread Tathagata Das
Can you verify that the cluster is running the correct version of Spark. 1.5.2. On Tue, Nov 17, 2015 at 7:23 PM, swetha kasireddy wrote: > Sorry compile makes it work locally. But, the cluster > still seems to have issues with provided. Basically it > does not seem