Re: [SparkLauncher] -Dspark.master with missing secondary master IP

2018-06-29 Thread bsikander
This is what my Driver launch command looks like, it only contains 1 master in -Dspark.master property whereas from Launcher I am passing 2 with port 6066. Launch Command: "/path/to/java" "-cp" "" "-Xmx1024M" "-Dspark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j-server.properties"

Re: [SparkLauncher] -Dspark.master with missing secondary master IP

2018-06-29 Thread bsikander
Can anyone please help. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RESTful Receiver

2018-06-29 Thread Timmy Duncan
Googling doesn't find similar thread except https://forums.databricks.com/questions/2095/sparkstreaming-to-process-http-rest-end-point-serv.html, and it seems people doing this through custom receiver as well

Re: Using newApiHadoopRDD for reading from HBase

2018-06-29 Thread Biplob Biswas
Can someone please help me out here, maybe point to some documentation for the same? I couldn't find almost anything. Thanks & Regards Biplob Biswas On Thu, Jun 28, 2018 at 11:13 AM Biplob Biswas wrote: > Hi, > > I had a few questions regarding the way *newApiHadoopRDD *accesses data > from

[SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-06-29 Thread Vincent Wang
Hi there, I'm using *GBTClassifier* do some classification jobs and find the performance of scoring stage is not quite satisfying. The trained model has about 160 trees and the input feature vector is sparse and its size is about 20+. After some digging, I found the model will repeatedly and

Interactive queries

2018-06-29 Thread amin mohebbi
I am currently working on a project in which we are dealing with TBs of CSV files generating yearly. The main source of data is interval meter data, and the others are Customer information, tariff  and sites information. We might be loading the data since 2010. One CVS(10 GBs min) file for each

Performance of Spark MLlib Kmean one function problem

2018-06-29 Thread llxlf
I'm reading the source code of spark MLlib Kmean, I find a clip of code -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark Streaming PID rate controller minRate default value

2018-06-29 Thread faxianzhao
Hi, there I think you should set "spark.streaming.backpressure.pid.minRate" as "no set" like "spark.streaming.backpressure.initialRate". The default value 100 is not good for my business. It's better to explain it more detail in document, and let user make decision by himself like

Setting log level to DEBUG while keeping httpclient.wire on WARN

2018-06-29 Thread Daniel Haviv
Hi, I'm trying to debug an issue with Spark so I've set log level to DEBUG but at the same time I'd like to avoid the httpclient.wire's verbose output by setting it to WARN. I tried the following log4.properties config but I'm still getting DEBUG outputs for httpclient.wire: