date:20180213

read parallel processing spark-cassandra

2018-02-13 Thread sujeet jog

Folks, I have a time series table with each record being 350 columns. the primary key is ((date, bucket), objectid, timestamp) objective is to read 1 day worth of data, which comes to around 12k partitions, each partition has around 25MB of data, I see only 1 task active during the read

not able to read git info from Scala Test Suite

2018-02-13 Thread karan alang

Hello - I'm writing a scala unittest for my Spark project which checks the git information, and somehow it is not working from the Unit Test Added in pom.xml -- pl.project13.maven git-commit-id-plugin 2.2.4

[Spark GraphX pregel] default value for EdgeDirection not consistent between programming guide and API documentation

2018-02-13 Thread Ramon Bejar Torres

Hi, I just wanted to notice that in the API doc page for the pregel operator (graphX API for spark 2.2.1):

Inefficient state management in stream to stream join in 2.3

2018-02-13 Thread Yogesh Mahajan

In 2.3, stream to stream joins(both Inner and Outer) are implemented using symmetric hash join(SHJ) algorithm, and that is a good choice and I am sure you had compared with other family of algorithms like XJoin and non-blocking sort based algorithms like progressive merge join (PMJ

Why python cluster mode is not supported in standalone cluster?

2018-02-13 Thread Ashwin Sai Shankar

Hi Spark users! I noticed that spark doesn't allow python apps to run in cluster mode in spark standalone cluster. Does anyone know the reason? I checked jira but couldn't find anything relevant. Thanks, Ashwin

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-02-13 Thread Yogesh Mahajan

I had a similar issue and i think that’s where the structured streaming design lacks. Seems like Question#2 in your email is a viable workaround for you. In my case, I have a custom Sink backed by an efficient in-memory column store suited for fast ingestion. I have a Kafka stream coming from

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-13 Thread dcam

Hi Mina I believe this is different for Structured Streaming from Kafka, specifically. I'm assuming you are using structured streaming based on the name of the dependency ("spark-streaming-kafka"). There is a note in the docs here:

Spark 2.2.1 EMR 5.11.1 Encrypted S3 bucket overwriting parquet file

2018-02-13 Thread Stephen Robinson

Hi All, I am using the latest version of EMR to overwrite Parquet files to an S3 bucket encrypted with a KMS key. I am seeing the attached error whenever I Overwrite a parquet file. For example the below code produces the attached error and stacktrace:

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-02-13 Thread dcam

Hi Priyank I have a similar structure, although I am reading from Kafka and sinking to multiple MySQL tables. My input stream has multiple message types and each is headed for a different MySQL table. I've looked for a solution for a few months, and have only come up with two alternatives: 1.

Retrieve batch metadata via the spark monitoring api

2018-02-13 Thread Hendrik Dev

I use Spark 2.2.1 with streaming and when i open the Spark Streaming UI i can see input metadata for each of my batches. In my case i stream from Kafka and in the metadata section i find useful informations about my topic, partitions and offsets. Assume the url for this batch looks like

Run Multiple Spark jobs. Reduce Execution time.

2018-02-13 Thread akshay naidu

Hello, I'm try to run multiple spark jobs on cluster running in yarn. Master is 24GB server with 6 Slaves of 12GB fairscheduler.xml settings are - FAIR 10 2 I am running 8 jobs simultaneously , jobs are running parallelly but not all. at a time only 7 of then runs simultaneously

Re: can udaf's return complex types?

2018-02-13 Thread Matteo Cossu

Hello, yes, sure they can return complex types. For example, the functions collect_list and collect_set return an ArrayType. On 10 February 2018 at 14:28, kant kodali wrote: > Hi All, > > Can UDAF's return complex types? like say a Map with key as an Integer and > the value

[Spark-Listener] [How-to] Listen only to specific events

2018-02-13 Thread Naved Alam

I have a spark application which creates multiple sessions. Each of these sessions can run jobs in parallel. I want to log some details about the execution of these jobs, but want to the tag them with the session they were called from. I tried creating a listener from within each session

read parallel processing spark-cassandra

not able to read git info from Scala Test Suite

[Spark GraphX pregel] default value for EdgeDirection not consistent between programming guide and API documentation

Inefficient state management in stream to stream join in 2.3

Why python cluster mode is not supported in standalone cluster?

Re: [Structured Streaming] Avoiding multiple streaming queries

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

Spark 2.2.1 EMR 5.11.1 Encrypted S3 bucket overwriting parquet file

Re: [Structured Streaming] Avoiding multiple streaming queries

Retrieve batch metadata via the spark monitoring api

Run Multiple Spark jobs. Reduce Execution time.

Re: can udaf's return complex types?

[Spark-Listener] [How-to] Listen only to specific events

13 matches

Site Navigation

Mail list logo

Footer information