date:20180629

Re: [SparkLauncher] -Dspark.master with missing secondary master IP

2018-06-29 Thread bsikander

This is what my Driver launch command looks like, it only contains 1 master in -Dspark.master property whereas from Launcher I am passing 2 with port 6066. Launch Command: "/path/to/java" "-cp" "" "-Xmx1024M" "-Dspark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j-server.properties"

Re: [SparkLauncher] -Dspark.master with missing secondary master IP

2018-06-29 Thread bsikander

Can anyone please help. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

RESTful Receiver

2018-06-29 Thread Timmy Duncan

Googling doesn't find similar thread except https://forums.databricks.com/questions/2095/sparkstreaming-to-process-http-rest-end-point-serv.html, and it seems people doing this through custom receiver as well

Re: Using newApiHadoopRDD for reading from HBase

2018-06-29 Thread Biplob Biswas

Can someone please help me out here, maybe point to some documentation for the same? I couldn't find almost anything. Thanks & Regards Biplob Biswas On Thu, Jun 28, 2018 at 11:13 AM Biplob Biswas wrote: > Hi, > > I had a few questions regarding the way *newApiHadoopRDD *accesses data > from

[SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-06-29 Thread Vincent Wang

Hi there, I'm using *GBTClassifier* do some classification jobs and find the performance of scoring stage is not quite satisfying. The trained model has about 160 trees and the input feature vector is sparse and its size is about 20+. After some digging, I found the model will repeatedly and

Interactive queries

2018-06-29 Thread amin mohebbi

I am currently working on a project in which we are dealing with TBs of CSV files generating yearly. The main source of data is interval meter data, and the others are Customer information, tariff and sites information. We might be loading the data since 2010. One CVS(10 GBs min) file for each

Performance of Spark MLlib Kmean one function problem

2018-06-29 Thread llxlf

I'm reading the source code of spark MLlib Kmean, I find a clip of code -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark Streaming PID rate controller minRate default value

2018-06-29 Thread faxianzhao

Hi, there I think you should set "spark.streaming.backpressure.pid.minRate" as "no set" like "spark.streaming.backpressure.initialRate". The default value 100 is not good for my business. It's better to explain it more detail in document, and let user make decision by himself like

Setting log level to DEBUG while keeping httpclient.wire on WARN

2018-06-29 Thread Daniel Haviv

Hi, I'm trying to debug an issue with Spark so I've set log level to DEBUG but at the same time I'd like to avoid the httpclient.wire's verbose output by setting it to WARN. I tried the following log4.properties config but I'm still getting DEBUG outputs for httpclient.wire:

Re: [SparkLauncher] -Dspark.master with missing secondary master IP

Re: [SparkLauncher] -Dspark.master with missing secondary master IP

RESTful Receiver

Re: Using newApiHadoopRDD for reading from HBase

[SparkML] Random access in SparseVector will slow down inference stage for some tree based models

Interactive queries

Performance of Spark MLlib Kmean one function problem

Spark Streaming PID rate controller minRate default value

Setting log level to DEBUG while keeping httpclient.wire on WARN

9 matches

Site Navigation

Mail list logo

Footer information