MesosClusterDispatcher problem : Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

2016-06-27 Thread 杜宇軒
Hi all I run spark on mesos cluster, and meet a problem : when I send 6 spark drivers *at the same time*, I can get the Information on node3:8081 that there are 4 drivers in "Launched Drivers" and 2 in "Queueed Drivers". On mesos:5050, I can see there are 4 active tasks are running, but each task

[ANNOUNCE] Announcing Spark 1.6.2

2016-06-27 Thread Reynold Xin
We are happy to announce the availability of Spark 1.6.2! This maintenance release includes fixes across several areas of Spark. You can find the list of changes here: https://s.apache.org/spark-1.6.2 And download the release here: http://spark.apache.org/downloads.html

Re: Running into issue using SparkIMain

2016-06-27 Thread Jayant Shekhar
I tried setting the classpath explicitly in the settings. Classpath gets printed properly, it has the scala jars in it like scala-compiler-2.10.4.jar, scala-library-2.10.4.jar. It did not help. Still runs great with IntelliJ, but runs into issues when running from the command line. val cl =

unsubscribe

2016-06-27 Thread Thomas Ginter
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Running into issue using SparkIMain

2016-06-27 Thread Jayant Shekhar
Hello, I'm trying to run scala code in a Web Application. It runs great when I am running it in IntelliJ Run into error when I run it from the command line. Command used to run -- java -Dscala.usejavacp=true -jar target/XYZ.war

Re: What is the explanation of "ConvertToUnsafe" in "Physical Plan"

2016-06-27 Thread Xinh Huynh
I guess it has to do with the Tungsten explicit memory management that builds on sun.misc.Unsafe. The "ConvertToUnsafe" class converts Java-object-based rows into UnsafeRows, which has the Spark internal memory-efficient format. Here is the related code in 1.6: ConvertToUnsafe is defined in:

JavaSparkContext: dependency on ui/

2016-06-27 Thread jay vyas
I notice that there is a dependency from the SparkContext on the "createLiveUI" functionality. Is that really required? Or is there a more minimal JAvaSparkContext we can create? Im packaging a jar with a spark client and would rather avoid resource/ dependencys as they might be trickier to

Re: Best practice for handing tables between pipeline components

2016-06-27 Thread Gene Pang
Yes, Alluxio (http://www.alluxio.org/) can be used to store data in-memory between stages in a pipeline. Here is more information about running Spark with Alluxio: http://www.alluxio.org/documentation/v1.1.0/en/Running-Spark-on-Alluxio.html Hope that helps, Gene On Mon, Jun 27, 2016 at 10:38

MapWithState would not restore from checkpoint.

2016-06-27 Thread Sergey Zelvenskiy
MapWithState would not restore from checkpoint. MapRDD code requires non empty spark contexts, while the context is empty. ERROR 2016-06-27 11:06:33,236 0 org.apache.spark.streaming.StreamingContext [run-main-0] Error starting the context, marking it as stopped org.apache.spark.SparkException:

Spark ML - Java implementation of custom Transformer

2016-06-27 Thread Mehdi Meziane
Hi all, We have some problems while implementing custom Transformers in JAVA (SPARK 1.6.1). We do override the method copy, but it crashes with an AbstractMethodError. If we extends the UnaryTransformer, and do not override the copy method, it works without any error. We tried to

Re: Best practice for handing tables between pipeline components

2016-06-27 Thread Sathish Kumaran Vairavelu
Alluxio off heap memory would help to share cached objects On Mon, Jun 27, 2016 at 11:14 AM Everett Anderson wrote: > Hi, > > We have a pipeline of components strung together via Airflow running on > AWS. Some of them are implemented in Spark, but some aren't. Generally

RE: Utils and Logging cannot be accessed in package ....

2016-06-27 Thread Paolo Patierno
Yes I have just realized that the code I was reading was in the org.apache.spark package related to customer receiver implementations. Thanks. Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor Twitter : @ppatierno Linkedin :

Re: Utils and Logging cannot be accessed in package ....

2016-06-27 Thread Ted Yu
AFAICT Utils is private: private[spark] object Utils extends Logging { So is Logging: private[spark] trait Logging { FYI On Mon, Jun 27, 2016 at 8:20 AM, Paolo Patierno wrote: > Hello, > > I'm trying to use the Utils.createTempDir() method importing >

Re: Arrays in Datasets (1.6.1)

2016-06-27 Thread Ted Yu
Can you show the stack trace for encoding error(s) ? Have you looked at the following test which involves NestedArray of primitive type ? ./sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala Cheers On Mon, Jun 27, 2016 at 8:50 AM, Daniel Imberman

Best practice for handing tables between pipeline components

2016-06-27 Thread Everett Anderson
Hi, We have a pipeline of components strung together via Airflow running on AWS. Some of them are implemented in Spark, but some aren't. Generally they can all talk to a JDBC/ODBC end point or read/write files from S3. Ideally, we wouldn't suffer the I/O cost of writing all the data to HDFS or

Arrays in Datasets (1.6.1)

2016-06-27 Thread Daniel Imberman
Hi all, So I've been attempting to reformat a project I'm working on to use the Dataset API and have been having some issues with encoding errors. From what I've read, I think that I should be able to store Arrays of primitive values in a dataset. However, the following class gives me encoding

run spark sql with script transformation faild

2016-06-27 Thread linxi zeng
Hi, all: Recently, we are trying to compare with spark sql and hive on MR, and I have tried to run spark (spark1.6 rc2) sql with script transformation, the spark job faild and get an error message like: 16/06/26 11:01:28 INFO codegen.GenerateUnsafeProjection: Code generated in 19.054534 ms

Utils and Logging cannot be accessed in package ....

2016-06-27 Thread Paolo Patierno
Hello, I'm trying to use the Utils.createTempDir() method importing org.apache.spark.util.Utils but the scala compiler says me that : object Utils in package util cannot be accessed in package org.apache.spark.util I'm facing the same problem with Logging. My sbt file has following dependency

Unsubscribe

2016-06-27 Thread Steve Florence

Spark partition formula on standalone mode?

2016-06-27 Thread kali.tumm...@gmail.com
Hi All, I did worked on spark installed on Hadoop cluster but never worked on spark on standalone cluster. My question how to set number of partitions in spark when it's running on spark standalone cluster? If spark on Hadoop I calculate my formula using hdfs block sizes but how I calculate

GraphX :Running on a Cluster

2016-06-27 Thread isaranto
Hi, I have been trying to run some algorithms i have implemented using GraphX and Spark. I have been running these algorithms locally by starting a local spark instance through IntelliJ (in scala). However when I try to run them on a cluster with 10 machines I get

Re: Substract two DStreams

2016-06-27 Thread Marius Soutier
Can't you use `transform` instead of `foreachRDD`? > On 15.06.2016, at 15:18, Matthias Niehoff > wrote: > > Hi, > > i want to subtract 2 DStreams (based on the same Input Stream) to get all > elements that exist in the original stream, but not in the

Spark SQL poor join performance

2016-06-27 Thread Samo Sarajevo
I'm using SparkSQL to make fact table out of 5 dimensions. I'm facing performance issue (job is taking several hours to complete), and even after exhaustive googleing I see no solution. These are settings I have tried turing, but no sucess.  sqlContext.sql("set

Spark SQL poor join performance

2016-06-27 Thread vegass
I'm using SparkSQL to make fact table out of 5 dimensions. I'm facing performance issue (job is taking several hours to complete), and even after exhaustive googleing I see no solution. These are settings I have tried turing, but no sucess. sqlContext.sql("set spark.sql.shuffle.partitions=10");

Querying Hive tables from Spark

2016-06-27 Thread Mich Talebzadeh
Hi, I have done some extensive tests with Spark querying Hive tables. It appears to me that Spark does not rely on statistics that are collected by Hive on say ORC tables. It seems that Spark uses its own optimization to query the Hive tables irrespective of Hive has collected by way of

[Spark 1.6.1] Beeline cannot start on Windows7

2016-06-27 Thread Haopu Wang
I see below stack trace when trying to run beeline command. I'm using JDK 7. Anything wrong? Much thanks! == D:\spark\download\spark-1.6.1-bin-hadoop2.4>bin\beeline Beeline version 1.6.1 by Apache Hive Exception in thread "main" java.lang.NoSuchMethodError:

Last() Window Function

2016-06-27 Thread Anton Okolnychyi
Hi all! I am learning Spark SQL and window functions. The behavior of the last() window function was unexpected for me in one case(for a person without any previous experience in the window functions). I define my window specification as follows: Window.partitionBy('transportType,

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-27 Thread Amit Sela
OK. I see that, but the current (provided) implementations are very naive - Sum, Count, Average -let's take Max for example: I guess zero() would be set to some value like Long.MIN_VALUE, but what if you trigger (I assume in the future Spark streaming will support time-based triggers) for a result

Re: Difference between Dataframe and RDD Persisting

2016-06-27 Thread Jörn Franke
Dataframe uses a more efficient binary representation to store and persist data. You should go for that one in most of the cases. Rdd is slower. > On 27 Jun 2016, at 07:54, Brandon White wrote: > > What is the difference between persisting a dataframe and a rdd? When I

Re: Spark Thrift Server Concurrency

2016-06-27 Thread Prabhu Joseph
Spark Thrift Server is started with ./sbin/start-thriftserver.sh --master yarn-client --hiveconf hive.server2.thrift.port=10001 --num-executors 4 --executor-cores 2 --executor-memory 4G --conf spark.scheduler.mode=FAIR 20 parallel below queries are executed select distinct val2 from philips1