Re: Docker configuration for akka spark streaming

2016-03-14 Thread David Gomez Saavedra
I have updated the config since I realized the actor system was listening on driver port + 1. So changed the ports in my program + the docker images val conf = new SparkConf() .setMaster(sparkMaster) //.setMaster("local[2]") .setAppName(sparkApp) .set("spark.cassandra.connection.host",

sparkR issues ?

2016-03-14 Thread roni
Hi, I am working with bioinformatics and trying to convert some scripts to sparkR to fit into other spark jobs. I tries a simple example from a bioinf lib and as soon as I start sparkR environment it does not work. code as follows - countData <- matrix(1:100,ncol=4) condition <-

RE: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-14 Thread Mukul Gupta
Thanks, Behavior is now clear to me. I tried with "foreachRDD" and indeed all partitions are being processed in parallel. I also tried using "saveAsTextFile" instead of print and again all partitions were processed in parallel. -Original Message- From: Cody Koeninger

[How To :]Custom Logging of Spark Scala scripts

2016-03-14 Thread Divya Gehlot
Hi, Can somebody point how can I confgure custom logs for my Spark (scala scripts) So that I can at which level my script failed and why ? Thanks, Divya

Documentation for "hidden" RESTful API for submitting jobs (not history server)

2016-03-14 Thread Hyukjin Kwon
Hi all, While googling Spark, I accidentally found a RESTful API existing in Spark for submitting jobs. The link is here, http://arturmkrtchyan.com/apache-spark-hidden-rest-api As Josh said, I can see the history of this RESTful API, https://issues.apache.org/jira/browse/SPARK-5388 and also

Compare a column in two different tables/find the distance between column data

2016-03-14 Thread Suniti Singh
Hi All, I have two tables with same schema but different data. I have to join the tables based on one column and then do a group by the same column name. now the data in that column in two table might/might not exactly match. (Ex - column name is "title". Table1. title = "doctor" and Table2.

Re: Failing MiMa tests

2016-03-14 Thread Gayathri Murali
Here is the PR : https://github.com/apache/spark/pull/11544 On Mon, Mar 14, 2016 at 7:26 PM, Ted Yu wrote: > Please refer to JIRAs which were related to MiMa > e.g. > [SPARK-13834][BUILD] Update sbt and sbt plugins for 2.x. > > It would be easier for other people to help

Re: Failing MiMa tests

2016-03-14 Thread Nan Zhu
I guess it’s Jenkins’ problem? My PR was failed for MiMa but still got a message from SparkQA (https://github.com/SparkQA) saying that "This patch passes all tests." I checked Jenkins’ history, there are other PRs with the same issue…. Best, -- Nan Zhu http://codingcat.me On Monday,

Re: Failing MiMa tests

2016-03-14 Thread Ted Yu
Please refer to JIRAs which were related to MiMa e.g. [SPARK-13834][BUILD] Update sbt and sbt plugins for 2.x. It would be easier for other people to help if you provide link to your PR. Cheers On Mon, Mar 14, 2016 at 7:22 PM, Gayathri Murali < gayathri.m.sof...@gmail.com> wrote: > Hi All, > >

Failing MiMa tests

2016-03-14 Thread Gayathri Murali
Hi All, I recently submitted a patch(which was passing all tests) with some minor modification to an existing PR. This patch is failing MiMa tests. Locally it passes all unit and style check tests. How do I fix MiMa test failures? Thanks Gayathri

Re: mess spark cluster mode error

2016-03-14 Thread sjk
when i change to default coarse-grained, it’s ok. > On Mar 14, 2016, at 21:55, sjk wrote: > > hi,all, when i run task on mesos, task error below. for help, thanks a lot. > > > cluster mode, command: > > $SPARK_HOME/spark-submit --class com.xxx.ETL --master >

Re: Hive query works in spark-shell not spark-submit

2016-03-14 Thread Mich Talebzadeh
This should work. Create your sbt file first cat PrintAllDatabases.sbt name := "PrintAllDatabases" version := "1.0" scalaVersion := "2.10.5" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.0" libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.5.0" libraryDependencies

Hive query works in spark-shell not spark-submit

2016-03-14 Thread rhuang
Hi all, I have several Hive queries that work in spark-shell, but they don't work in spark-submit. In fact, I can't even show all databases. The following works in spark-shell: import org.apache.spark._ import org.apache.spark.sql._ object ViewabilityFetchInsertDailyHive { def main() {

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread Tristan Nixon
I see - so you want the dependencies pre-installed on the cluster nodes so they do not need to be submitted along with the job jar? Where are you planning on deploying/running spark? Do you have your own cluster or are you using AWS/other IaaS/PaaS provider? Somehow you’ll need to get the

Re: Changing number of workers for benchmarking purposes

2016-03-14 Thread Kalpit Shah
I think "SPARK_WORKER_INSTANCES" is deprecated. This should work: "export SPARK_EXECUTOR_INSTANCES=2" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Changing-number-of-workers-for-benchmarking-purposes-tp2606p26491.html Sent from the Apache Spark User

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread prateek arora
Hi I do not want create single jar that contains all the other dependencies . because it will increase the size of my spark job jar . so i want to copy all libraries in cluster using some automation process . just like currently i am using chef . but i am not sure is it a right method or not ?

Re: Can we use spark inside a web service?

2016-03-14 Thread Evan Chan
Andres, A couple points: 1) If you look at my post, you can see that you could use Spark for low-latency - many sub-second queries could be executed in under a second, with the right technology. It really depends on "real time" definition, but I believe low latency is definitely possible. 2)

Re: shuffle in spark

2016-03-14 Thread Jules Damji
Hello Ashok, I found three sources of how shuffle works (and what transformations trigger it) instructive and illuminative. After learning from it, you should be able to extrapolate how your particular and practical use case would work.

Re: Can we use spark inside a web service?

2016-03-14 Thread Evan Chan
At least for simple queries, the DAGScheduler does not appear to be the bottleneck - since we are able to schedule 700 queries, and all the scheduling is probably done from the main application thread. However, I did have high hopes for Sparrow. What was the reason they decided not to include

Re: Docker configuration for akka spark streaming

2016-03-14 Thread Shixiong(Ryan) Zhu
Could you use netstat to show the ports that the driver is listening? On Mon, Mar 14, 2016 at 1:45 PM, David Gomez Saavedra wrote: > hi everyone, > > I'm trying to set up spark streaming using akka with a similar example of > the word count provided. When using spark master

shuffle in spark

2016-03-14 Thread Ashok Kumar
experts, please I need to understand how shuffling works in Spark and which parameters influence it. I am sorry but my knowledge of shuffling is very limited. Need a practical use case if you can. regards

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread Jakob Odersky
Have you tried setting the configuration `spark.executor.extraLibraryPath` to point to a location where your .so's are available? (Not sure if non-local files, such as HDFS, are supported) On Mon, Mar 14, 2016 at 2:12 PM, Tristan Nixon wrote: > What build system are you

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread Tristan Nixon
What build system are you using to compile your code? If you use a dependency management system like maven or sbt, then you should be able to instruct it to build a single jar that contains all the other dependencies, including third-party jars and .so’s. I am a maven user myself, and I use the

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Michael Armbrust
On Mon, Mar 14, 2016 at 1:30 PM, Prabhu Joseph wrote: > > Thanks for the recommendation. But can you share what are the > improvements made above Spark-1.2.1 and how which specifically handle the > issue that is observed here. > Memory used for query execution is

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread prateek arora
Hi Thanks for the information . but my problem is that if i want to write spark application which depend on third party libraries like opencv then whats is the best approach to distribute all .so and jar file of opencv in all cluster ? Regards Prateek -- View this message in context:

Docker configuration for akka spark streaming

2016-03-14 Thread David Gomez Saavedra
hi everyone, I'm trying to set up spark streaming using akka with a similar example of the word count provided. When using spark master in local mode everything works but when I try to run it the driver and executors using docker I get the following exception 16/03/14 20:32:03 WARN

Re: Changing number of workers for benchmarking purposes

2016-03-14 Thread lisak
Hey, I'm using this setup in a single m4.4xlarge node in order to utilize it : https://github.com/gettyimages/docker-spark/blob/master/docker-compose.yml but setting : SPARK_WORKER_INSTANCES: 2 SPARK_WORKER_CORES: 2 still creates only one worker. One JVM process that utilizes up

Re: Exceptions when accessing Spark metrics with REST API

2016-03-14 Thread Boric Tan
I saw that JIRA too. But not sure if they are related, since the JIRA mentioned "I got an exception when accessing the below REST API with an unknown application Id.". While in my case, a "known" application ID was supplied. Anyway, I guess I can try 1.6.1 to double check. Thanks, Ted! On Mon,

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Prabhu Joseph
Michael, Thanks for the recommendation. But can you share what are the improvements made above Spark-1.2.1 and how which specifically handle the issue that is observed here. On Tue, Mar 15, 2016 at 12:03 AM, Jörn Franke wrote: > I am not sure about this. At least

Re: Exceptions when accessing Spark metrics with REST API

2016-03-14 Thread Ted Yu
See the following which is in 1.6.1: [SPARK-12399] Display correct error message when accessing REST API with an unknown app Id On Mon, Mar 14, 2016 at 1:16 PM, Boric Tan wrote: > I was using 1.6.0. Sorry I forgot to mention that. > > The full stack is shown below. >

Re: Exceptions when accessing Spark metrics with REST API

2016-03-14 Thread Boric Tan
I was using 1.6.0. Sorry I forgot to mention that. The full stack is shown below. HTTP ERROR 500 Problem accessing /api/v1/applications/application_1457544696648_0002/jobs. Reason: Server Error Caused by: org.spark-project.guava.util.concurrent.UncheckedExecutionException:

Spark-submit, Spark 1.6, how to get status of Job?

2016-03-14 Thread Emmanuel
Hello,When I used to submit a job with spark 1.4, it would return a job ID and a status RUNNING, FAILED or something like this.I just upgraded to 1.6 and there is no status returned by spark-submitIs there a way to get this information back? When I submit a job I want to know which one it

Re: Exceptions when accessing Spark metrics with REST API

2016-03-14 Thread Ted Yu
Which Spark release do you use ? For NoSuchElementException, was there anything else in the stack trace ? Thanks On Mon, Mar 14, 2016 at 12:12 PM, Boric Tan wrote: > Hi there, > > I was trying to access application information with REST API. Looks like > the > top

Exceptions when accessing Spark metrics with REST API

2016-03-14 Thread Boric Tan
Hi there, I was trying to access application information with REST API. Looks like the top application information can be retrieved successfully, as shown below. But jobs/stages information cannot be retrieved; an exception was returned. Any one has any ideas on how to fix it? Thanks! Top

Reading an RDD from a checkpoint.

2016-03-14 Thread Daniel Imberman
So I'm attempting to pre-compute my data such that I can pull an RDD from a checkpoint. However, I'm finding that upon running the same job twice the system is simply recreating the RDD from scratch. Here is the code I'm implementing to create the checkpoint: def

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Jörn Franke
I am not sure about this. At least Hortonworks provides its distribution with Hive and Spark 1.6 > On 14 Mar 2016, at 09:25, Mich Talebzadeh wrote: > > I think the only version of Spark that works OK with Hive (Hive on Spark > engine) is version 1.3.1. I also get

Re: [MARKETING] Spark Streaming stateful transformation mapWithState function getting error scala.MatchError: [Ljava.lang.Object]

2016-03-14 Thread Vinti Maheshwari
Hi Iain, Thanks for your reply. Actually i changed my trackStateFunc, it's working now. For reference my working code with mapWithState: def trackStateFunc(batchTime: Time, key: String, value: Option[Array[Long]], state: State[Array[Long]]) : Option[(String, Array[Long])] = { // Check if

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Michael Armbrust
+1 to upgrading Spark. 1.2.1 has non of the memory management improvements that were added in 1.4-1.6. On Mon, Mar 14, 2016 at 2:03 AM, Prabhu Joseph wrote: > The issue is the query hits OOM on a Stage when reading Shuffle Output > from previous stage.How come

Re: Spark SQL / Parquet - Dynamic Schema detection

2016-03-14 Thread Michael Armbrust
> > Each json file is of a single object and has the potential to have > variance in the schema. > How much variance are we talking? JSON->Parquet is going to do well with 100s of different columns, but at 10,000s many things will probably start breaking.

Re: Can someone fix this download URL?

2016-03-14 Thread Michael Armbrust
Yeah, sorry. I'll make sure this gets fixed. On Mon, Mar 14, 2016 at 12:48 AM, Sean Owen wrote: > Yeah I can't seem to download any of the artifacts via the direct download > / cloudfront URL. The Apache mirrors are fine, so use those for the moment. > @marmbrus were you

Re: OOM Exception in my spark streaming application

2016-03-14 Thread Chitturi Padma
*Something like below ...* *Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space at org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66) *

Re: YARN process with Spark

2016-03-14 Thread Alexander Pivovarov
As in Hadoop 2.5.1 of MapR 4.1.0, virtual memory checker is disabled while physical memory checker is enabled by default. Since on Centos/RHEL 6 there are aggressive allocation of virtual memory due to OS behavior, you should disable virtual memory checker or increase

Spark SQL / Parquet - Dynamic Schema detection

2016-03-14 Thread Anthony Andras
Hello there, I am trying to write a program in Spark that is attempting to load multiple json files (with undefined schemas) into a dataframe and then write it out to a parquet file. When doing so, I am running into a number of garbage collection issues as a result of my JVM running out of heap

Re: Problem running JavaDirectKafkaWordCount

2016-03-14 Thread Cody Koeninger
Sounds like the jar you built doesn't include the dependencies (in this case, the spark-streaming-kafka subproject). When you use spark-submit to submit a job to spark, you need to either specify all dependencies as additional --jars arguments (which is a pain), or build an uber-jar containing

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-14 Thread Cody Koeninger
So what's happening here is that print() uses take(). Take() will try to satisfy the request using only the first partition of the rdd, then use other partitions if necessary. If you change to using something like foreach processed.foreachRDD(new VoidFunction() {

Reducing multiple values

2016-03-14 Thread Kevin Mc Ghee
Hi all, For each record I’m processing in a Spark streaming app (written in Java) I need to take over 30 datapoints. The output of my map would be something like: KEY1,1,0,1,0,30,1,1,1,1,0,30,… KEY1,0,1,1,0,15,1,1,1,1,0,28,… KEY2,0,1,1,0,22,1,1,1,1,0,0,… And I want to end up with:

Re: OOM Exception in my spark streaming application

2016-03-14 Thread adamreith
What you mean? I 've pasted the output in the same format used by spark... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OOM-Exception-in-my-spark-streaming-application-tp26479p26483.html Sent from the Apache Spark User List mailing list archive at

Re: Spark session dies in about 2 days: HDFS_DELEGATION_TOKEN token can'tbe found

2016-03-14 Thread Nikhil Gs
Mine is the same scenario. I get the HDFS_DELEGATION_TOKEN issue exactly after the 7 days of the spark job started and it then gets killed. Even I'm also looking for the solution. Regards, Nik. On Fri, Mar 11, 2016 at 8:10 PM, Ruslan Dautkhanov wrote: > [image: Boxbe]

Spark Streaming Twiiter and Standalone cluster

2016-03-14 Thread palamaury
Hi, I have an issue using Spark Streaming with a Spark Standalone cluster, my job is well submitted but the workers seem to be unreachable. To build the project I'musing sbt-assembly. My version of spark is 1.6.0. here is my streaming conf: val sparkConf = new SparkConf()

Re: append rows to dataframe

2016-03-14 Thread Ted Yu
Summarizing an offline message: The following worked for Divya: dffiltered = dffiltered.unionAll(dfresult.filter ... On Mon, Mar 14, 2016 at 5:54 AM, Lohith Samaga M wrote: > If all sql results have same set of columns you could UNION all the > dataframes > > Create

mess spark cluster mode error

2016-03-14 Thread sjk
hi,all, when i run task on mesos, task error below. for help, thanks a lot. cluster mode, command: $SPARK_HOME/spark-submit --class com.xxx.ETL --master mesos://192.168.191.116:7077 --deploy-mode cluster --supervise --driver-memory 2G --executor-memory 10G — total-executor-cores 4

Re: OOM Exception in my spark streaming application

2016-03-14 Thread Bryan Jeffrey
Steve & Adam, I would be interesting in hearing the outcome here as well. I am seeing some similar issues in my 1.4.1 pipeline, using stateful functions (reduceByKeyAndWindow and updateStateByKey). Regards, Bryan Jeffrey On Mon, Mar 14, 2016 at 6:45 AM, Steve Loughran

RE: append rows to dataframe

2016-03-14 Thread Lohith Samaga M
If all sql results have same set of columns you could UNION all the dataframes Create an empty df and Union all Then reassign new df to original df before next union all Not sure if it is a good idea, but it works Lohith Sent from my Sony Xperia™ smartphone Divya Gehlot wrote Hi,

Printing MLpipeline model in Python.

2016-03-14 Thread VISHNU SUBRAMANIAN
HI All, I am using Spark 1.6 and Pyspark. I am trying to build a Randomforest classifier model using mlpipeline and in python. When I am trying to print the model I get the below value. RandomForestClassificationModel (uid=rfc_be9d4f681b92) with 10 trees When I use MLLIB RandomForest model

Performance tuning of spark pipeline

2016-03-14 Thread Jatin Kumar
Hello all, I have some doubts regarding performance tuning of my pipeline. I am trying to achieve the following: 1. Consume from Kafka in 2 sec batches, filter it and remove 95% of data, which comes down to around 4K messages/sec 2. Maintain keys (strings) by frequency over a moving window of

Re: OOM Exception in my spark streaming application

2016-03-14 Thread Chitturi Padma
Hi, Can you please try to show the stack trace line by line, because its bit difficult to read the entire paragraph and make sense out of it . On Mon, Mar 14, 2016 at 3:11 PM, adamreith [via Apache Spark User List] < ml-node+s1001560n26479...@n3.nabble.com> wrote: > Hi, > > I'm using spark

Re: Why KMeans with mllib is so slow ?

2016-03-14 Thread Priya Ch
Hi Xi Shen, Changing the initialization step from "kmeans||" to "random" decreased the execution time from 2 hrs to 6 min. However, by default the no.of runs is 1. If I try to set the number of runs to 10, then again see increase in job execution time. How to proceed on this ?. By the way how

Re: OOM Exception in my spark streaming application

2016-03-14 Thread Steve Loughran
> On 14 Mar 2016, at 09:41, adamreith wrote: > > I dumped the heap of the driver process and seems that 486.2 MB on 512 MB of > the available memory is used by an instance of the class > /org.apache.spark.deploy.yarn.history.YarnHistoryService/. I'm trying to > figure out

Re: YARN process with Spark

2016-03-14 Thread Steve Loughran
On 11 Mar 2016, at 23:01, Alexander Pivovarov > wrote: Forgot to mention. To avoid unnecessary container termination add the following setting to yarn yarn.nodemanager.vmem-check-enabled = false That can kill performance on a shared cluster:

OOM Exception in my spark streaming application

2016-03-14 Thread adamreith
Hi, I'm using spark 1.4.1 and i have a simple application that create a dstream that read data from kafka and apply a filter transformation on it. After more or less a day throw the following exception: /Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space

Possible memory leak

2016-03-14 Thread adamreith
Hi, I'm using spark 1.4.1 and i have a simple application that create a dstream that read data from kafka and apply a filter transformation on it. After more or less a day throw the following exception: /Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space

Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2016-03-14 Thread nir
For uniform partitioning, you can try custom Partitioner. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Requested-array-size-exceeds-VM-limit-tp16809p26477.html Sent from the Apache Spark User List mailing list archive at

How to Catch Spark Streaming Twitter Exception ( Written Java)

2016-03-14 Thread Soni spark
Dear All, I am facing problem with Spark Twitter Streaming code, When ever twitter4j throws exception, i am unable to catch that exception. Could anyone help me catching that exception. Here is Pseudo Code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("Test"); //

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Prabhu Joseph
The issue is the query hits OOM on a Stage when reading Shuffle Output from previous stage.How come increasing shuffle memory helps to avoid OOM. On Mon, Mar 14, 2016 at 2:28 PM, Sabarish Sasidharan wrote: > Thats a pretty old version of Spark SQL. It is devoid of all

java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer"

2016-03-14 Thread Nagu Kothapalli
Hi Team, I am geeting below exceptions , while running the spark java streaming job with custome reciver. org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 508, not attempting to retry it. Exception during serialization: java.io.IOException:

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Sabarish Sasidharan
Thats a pretty old version of Spark SQL. It is devoid of all the improvements introduced in the last few releases. You should try bumping your spark.sql.shuffle.partitions to a value higher than default (5x or 10x). Also increase your shuffle memory fraction as you really are not explicitly

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Prabhu Joseph
It is a Spark-SQL and the version used is Spark-1.2.1. On Mon, Mar 14, 2016 at 2:16 PM, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote: > I believe the OP is using Spark SQL and not Hive on Spark. > > Regards > Sab > > On Mon, Mar 14, 2016 at 1:55 PM, Mich Talebzadeh < >

Re: Error while deploying spark 1.6.1 on EC2

2016-03-14 Thread sandesh deshmane
> I am trying to install spark on EC2. > > I am getting below error. I had issues like RPC timeout and Fetchtimeout > for spark 1.6.0 so as per release notes was trying to get new cluster with > 1.6.1 > > Can you help? looks like spark 1.6.1 package is missing from s3. > > [timing] scala init:

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Sabarish Sasidharan
I believe the OP is using Spark SQL and not Hive on Spark. Regards Sab On Mon, Mar 14, 2016 at 1:55 PM, Mich Talebzadeh wrote: > I think the only version of Spark that works OK with Hive (Hive on Spark > engine) is version 1.3.1. I also get OOM from time to time and

Re: how to convert the RDD[Array[Double]] to RDD[Double]

2016-03-14 Thread Michał Zieliński
For RDD you can use flatMap, for DataFrames explode would be the best fit. On 14 March 2016 at 08:28, lizhenm...@163.com wrote: > > hi: > *I want to *convert the RDD[Array[Double]] to RDD[Double]. for example, > t stored 1.0 2.0 3.0 in the file , how i read > >

how to convert the RDD[Array[Double]] to RDD[Double]

2016-03-14 Thread lizhenm...@163.com
hi: I want to convert the RDD[Array[Double]] to RDD[Double]. for example, t stored 1.0 2.0 3.0 in the file , how i read 4.0 5.0 6.0 the file and convert

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Mich Talebzadeh
I think the only version of Spark that works OK with Hive (Hive on Spark engine) is version 1.3.1. I also get OOM from time to time and have to revert using MR Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Sabarish Sasidharan
Which version of Spark are you using? The configuration varies by version. Regards Sab On Mon, Mar 14, 2016 at 10:53 AM, Prabhu Joseph wrote: > Hi All, > > A Hive Join query which runs fine and faster in MapReduce takes lot of > time with Spark and finally fails

Re: Can someone fix this download URL?

2016-03-14 Thread Sean Owen
Yeah I can't seem to download any of the artifacts via the direct download / cloudfront URL. The Apache mirrors are fine, so use those for the moment. @marmbrus were you maybe the last to deal with these artifacts during the release? I'm not sure where they are or how they get uploaded or I'd look

Re: kill Spark Streaming job gracefully

2016-03-14 Thread Shams ul Haque
Any one have any idea? or should i raise a bug for that? Thanks, Shams On Fri, Mar 11, 2016 at 3:40 PM, Shams ul Haque wrote: > Hi, > > I want to kill a Spark Streaming job gracefully, so that whatever Spark > has picked from Kafka have processed. My Spark version is:

Terminate Spark job in eclipse

2016-03-14 Thread Soni spark
Hi Friends, Anyone can help me about how to terminate the Spark job in eclipse using java code? Thanks Soniya