Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov
oh, and try to run even smaller executors, i.e. with `spark.executor.memory` <= 16GiB. I wonder what result you're going to get. On Sun, Oct 2, 2016 at 1:24 AM, Vadim Semenov wrote: > > Do you mean running a multi-JVM 'cluster' on the single machine? > Yes, that's

Re: get different results when debugging and running scala program

2016-10-01 Thread Vadim Semenov
The question has no connection to spark. In future, if you use apache mailing lists, use external services to add screenshots and make sure that your code is formatted so other members'd be able to read it. On Fri, Sep 30, 2016 at 11:25 AM, chen yong wrote: > Hello All, > >

Re: Spark on yarn enviroment var

2016-10-01 Thread Vadim Semenov
The question should be addressed to the oozie community. As far as I remember, a spark action doesn't have support of env variables. On Fri, Sep 30, 2016 at 8:11 PM, Saurabh Malviya (samalviy) < samal...@cisco.com> wrote: > Hi, > > > > I am running spark on yarn using oozie. > > > > When submit

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov
> Do you mean running a multi-JVM 'cluster' on the single machine? Yes, that's what I suggested. You can get some information here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ > How would that affect performance/memory-consumption? If a multi-JVM setup can

use CrossValidatorModel for prediction

2016-10-01 Thread Pengcheng
Dear Spark Users, I was wondering. I have a trained crossvalidator model *model: CrossValidatorModel* I wan to predict a score for *features: RDD[Features]* Right now I have to convert features to dataframe and then perform predictions as following: """ val sqlContext = new

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Babak Alipour
To add one more note, I tried running more smaller executors each with 32-64g memory and executor.cores 2-4 (with 2 workers as well) and I'm still getting the same exception: java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes at

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Babak Alipour
Do you mean running a multi-JVM 'cluster' on the single machine? How would that affect performance/memory-consumption? If a multi-JVM setup can handle such a large input, then why can't a single-JVM break down the job into smaller tasks? I also found that SPARK-9411 mentions making the page_size

Re: Restful WS for Spark

2016-10-01 Thread Vadim Semenov
I worked with both, so I'll give you some insight from my perspective. spark-jobserver has stable API and overall mature but doesn't work with yarn-cluster mode and python support is in-development right now. Livy has stable API (but I'm not sure if I can speak for it since it has appeared

Re: Broadcast big dataset

2016-10-01 Thread Anastasios Zouzias
Hey, Is the driver running OOM? Try 8g on the driver memory. Speaking of which, how do you estimate that your broadcasted dataset is 500M? Best, Anastasios Am 29.09.2016 5:32 AM schrieb "WangJianfei" : > First thank you very much! > My executor memeory is

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Benjamin Kim
Mich, I know up until CDH 5.4 we had to add the HTrace jar to the classpath to make it work using the command below. But after upgrading to CDH 5.7, it became unnecessary. echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar" >> /etc/spark/conf/classpath.txt Hope this helps.

Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-01 Thread Mich Talebzadeh
Trying bulk load using Hfiles in Spark as below example: import org.apache.spark._ import org.apache.spark.rdd.NewHadoopRDD import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} import org.apache.hadoop.hbase.client.HBaseAdmin import

Re: Deep learning libraries for scala

2016-10-01 Thread janardhan shetty
Apparently there are no Neural network implementations in tensorframes which we can use right ? or Am I missing something here. I would like to apply neural networks for an NLP settting is there are any implementations which can be looked into ? On Fri, Sep 30, 2016 at 8:14 PM, Suresh Thalamati

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-01 Thread Sean Owen
"Compile failed via zinc server" Try shutting down zinc. Something's funny about your compile server. It's not required anyway. On Sat, Oct 1, 2016 at 3:24 PM, Marco Mistroni wrote: > Hi guys > sorry to annoy you on this but i am getting nowhere. So far i have tried to >

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-01 Thread Marco Mistroni
Hi guys sorry to annoy you on this but i am getting nowhere. So far i have tried to build spark 2.0 on my local laptop with no success so i blamed my laptop poor performance So today i fired off an EC2 Ubuntu 16.06 Instance and installed the following (i copy paste commands here)

Performance problem with BlockMatrix.add()

2016-10-01 Thread Andi
Hello, I'm implementing a pagerank-like iterative algorithm where in each iteration a number of matrix operations is performed. One step is to add two matrices that are both the result of several matrix multiplications. Unfortunately, using the add() Operation of BlockMatrix Spark gets completely

execution sequence puzzle

2016-10-01 Thread chen yong
Hello everybody, I am puzzled by the execution sequence of the following scala program. Please tell me if it run in the same sequence on your computer,.and it is normal. Thanks execution sequence according to line number: 26-7-20-27-9-10-12-13-15-7-20-17-18-9-10-12-13-15-7-20-17-18 1.

答复: get different results when debugging and running scala program

2016-10-01 Thread chen yong
Dear Jakob, Thanks for your reply. The output text in console is as follows runing output: 1 2 3 4 5 6 7 in 3 in 2 in 1 in debugging output: 01 02 03 04 05 06 07 08 09 10 11 in 3 in 2 in 2 in 1 in 1 in 发件人: Jakob Odersky 发送时间:

Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

2016-10-01 Thread Takeshi Yamamuro
I got this info. from a hadoop jira ticket: https://issues.apache.org/jira/browse/MAPREDUCE-5485 // maropu On Sat, Oct 1, 2016 at 7:14 PM, Igor Berman wrote: > Takeshi, why are you saying this, how have you checked it's only used from > 2.7.3? > We use spark 2.0 which is

Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

2016-10-01 Thread Igor Berman
Takeshi, why are you saying this, how have you checked it's only used from 2.7.3? We use spark 2.0 which is shipped with hadoop dependency of 2.7.2 and we use this setting. We've sort of "verified" it's used by configuring log of file output commiter On 30 September 2016 at 03:12, Takeshi