Re: Stream is corrupted in ShuffleBlockFetcherIterator

2019-08-27 Thread Mikhail Pryakhin
Thank you all for your help. The issue was caused by few failed disks in the cluster. Right after they had been replaced everything worked well. Looking forward to moving to spark 3.0 which is able to manage corrupted shuffle blocks Cheers, Mike Pryakhin. On Wed, 28 Aug 2019 at 03:44, Darshan

Re: web access to sparkUI on docker or k8s pods

2019-08-27 Thread Yaniv Harpaz
thank you, I will check it out Yaniv Harpaz [ yaniv.harpaz at gmail.com ] On Wed, Aug 28, 2019 at 7:14 AM Rao, Abhishek (Nokia - IN/Bangalore) < abhishek@nokia.com> wrote: > Hi, > > > > We have seen this issue when we tried to bringup the UI on custom ingress > path (default ingress path

Is groupBy and partition are similar in this scenario? Still I need to do paritioning here to save into Cassandra ?

2019-08-27 Thread Shyam P
Hi, Is groupBy and partition are similar in this scenario? I know they are not similar and mean for different purpose but I am confused here. Still I need to do partitioning here to save into Cassandra ? Below is my scenario. I am using spark-sql-2.4.1 ,spark-cassandra-connector_2.11-2.4.1 with

RE: web access to sparkUI on docker or k8s pods

2019-08-27 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi, We have seen this issue when we tried to bringup the UI on custom ingress path (default ingress path “/” works). Do you also have similar configuration? We tired setting spark.ui.proxyBase and spark.ui.reverseProxy but did not help. As a workaround, we’re using ingress port (port on edge

question about pyarrow.Table to pyspark.DataFrame conversion

2019-08-27 Thread Artem Kozhevnikov
I wonder if there's some recommended method to convert in memory pyarrow.Table (or pyarrow.BatchRecord) to pyspark.Dataframe without using pandas ? My motivation is about converting nested data (like List[int]) that have an efficient representation in pyarrow which is not possible with Pandas (I

Re: Stream is corrupted in ShuffleBlockFetcherIterator

2019-08-27 Thread Darshan Pandya
you can also try to set "spark.io.compression.codec" to "snappy" to try a different compression codec On Fri, Aug 16, 2019 at 10:14 AM Vadim Semenov wrote: > This is what you're looking for: > > Handle large corrupt shuffle blocks > https://issues.apache.org/jira/browse/SPARK-26089 > > So

Re: Structured Streaming Dataframe Size

2019-08-27 Thread Tathagata Das
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts *Note that Structured Streaming does not materialize the entire table*. It > reads the latest available data from the streaming data source, processes > it incrementally to update the result, and then

Structured Streaming Dataframe Size

2019-08-27 Thread Nick Dawes
I have a quick newbie question. Spark Structured Streaming creates an unbounded dataframe that keeps appending rows to it. So what's the max size of data it can hold? What if the size becomes bigger than the JVM? Will it spill to disk? I'm using S3 as storage. So will it write temp data on S3 or

Driver - Stops Scheduling Streaming Jobs

2019-08-27 Thread Bryan Jeffrey
Hello! We're running Spark 2.3.0 on Scala 2.11. We have a number of Spark Streaming jobs that are using MapWithState. We've observed that these jobs will complete some set of stages, and then not schedule the next set of stages. It looks like the DAG Scheduler correctly identifies required

Blue-Green Deployment of Structured Streaming

2019-08-27 Thread Cressy, Taylor
Hi all, We are attempting to come up with a blue-green deployment strategy for our structured streaming job to minimize down time. The general flow would be: 1. Job A is currently streaming 2. Job B comes up and starts loading Job A state without starting its query. 3. Job B completes

web access to sparkUI on docker or k8s pods

2019-08-27 Thread Yaniv Harpaz
hello guys, when I launch driver pods or even when I use docker run with the spark image, the spark master UI (8080) works great, but the sparkUI (4040) is loading w/o the CSS when I dig a bit deeper I see "Refused to apply style from '' because its MIME type ('text/html') is not supported

Spark on k8s: Mount config map in executor

2019-08-27 Thread Steven Stetzler
Hello everyone, I am wondering if there is a way to mount a Kubernetes ConfigMap into a directory in a Spark executor on Kubernetes. Poking around the docs, the only volume mounting options I can find are for a PVC, a directory on the host machine, and an empty volume. I am trying to pass in