date:20180524

Re: Livy Failed error on Yarn with Spark

2018-05-24 Thread Jeff Zhang

Could you check the the spark app's yarn log and livy log ? Chetan Khatri 于2018年5月10日周四上午4:18写道： > All, > > I am running on Hortonworks HDP Hadoop with Livy and Spark 2.2.0, when I > am running same spark job using spark-submit it is getting success with all >

Re: [Spark] Supporting python 3.5?

2018-05-24 Thread Jeff Zhang

It supports python 3.5, and IIRC, spark also support python 3.6 Irving Duran 于2018年5月10日周四下午9:08写道： > Does spark now support python 3.5 or it is just 3.4.x? > > https://spark.apache.org/docs/latest/rdd-programming-guide.html > > Thank You, > > Irving Duran >

Re: Spark on YARN in client-mode: do we need 1 vCore for the AM?

2018-05-24 Thread Jeff Zhang

I don't think it is possible to have less than 1 core for AM, this is due to yarn not spark. The number of AM comparing to the number of executors should be small and acceptable. If you do want to save more resources, I would suggest you to use yarn cluster mode where driver and AM run in the

re: help with streaming batch interval question needed

2018-05-24 Thread Peter Liu

Hi there, from my apache spark streaming website (see links below), - the batch-interval is set when a spark StreamingContext is constructed (see example (a) quoted below) - the StreamingContext is available in older and new Spark version (v1.6, v2.2 to v2.3.0) (see

Re: [Beginner][StructuredStreaming] Using Spark aggregation - WithWatermark on old data

2018-05-24 Thread karthikjay

My data looks like this: { "ts2" : "2018/05/01 00:02:50.041", "serviceGroupId" : "123", "userId" : "avv-0", "stream" : "", "lastUserActivity" : "00:02:50", "lastUserActivityCount" : "0" } { "ts2" : "2018/05/01 00:09:02.079", "serviceGroupId" : "123", "userId" : "avv-0",

Re: Time series data

2018-05-24 Thread Vadim Semenov

Yeah, it depends on what you want to do with that timeseries data. We at Datadog process trillions of points daily using Spark, I cannot really go about what exactly we do with the data, but just saying that Spark can handle the volume, scale well and be fault-tolerant, albeit everything I said

Streaming : WAL ignored

2018-05-24 Thread Walid Lezzar

Hi, I have a spark streaming application running on yarn that consumes from a jms source. I have the checkpointing and WAL enabled to ensure zero data loss. However, When I suddenly kill my application and restarts it, sometimes it recovers the data from the WAL but sometimes it doesn’t !! In

Positive log-likelihood with Gaussian mixture

2018-05-24 Thread Simon Dirmeier

Dear all, I am fitting a very trivial GMM with 2-10 components on 100 samples and 5 features in pyspark and observe some of the log-likelihoods being positive (see below). I don't undestand how this is possible. Is this a bug or an intended behaviour? Furthermore, for different seeds,

Re: Time series data

2018-05-24 Thread Jörn Franke

There is not one answer to this. It really depends what kind of time Series analysis you do with the data and what time series database you are using. Then it also depends what Etl you need to do. You seem to also need to join data - is it with existing data of the same type or do you join

Time series data

2018-05-24 Thread amin mohebbi

Could you please help me to understand the performance that we get from using spark with any nosql or TSDB ? We receive 1 mil meters x 288 readings = 288 mil rows (Approx. 360 GB per day) – Therefore, we will end up with 10'sor 100's of TBs of data and I feel that NoSQL will be much quicker

Re: Livy Failed error on Yarn with Spark

Re: [Spark] Supporting python 3.5?

Re: Spark on YARN in client-mode: do we need 1 vCore for the AM?

re: help with streaming batch interval question needed

Re: [Beginner][StructuredStreaming] Using Spark aggregation - WithWatermark on old data

Re: Time series data

Streaming : WAL ignored

Positive log-likelihood with Gaussian mixture

Re: Time series data

Time series data

10 matches

Site Navigation

Mail list logo

Footer information