date:20190506

Re: Spark structured streaming watermarks on nested attributes

2019-05-06 Thread Yuanjian Li

Hi Joe I think you met this issue: https://issues.apache.org/jira/browse/SPARK-27340 You can check the description in Jira and PR. We also met this in our production env and fixed by the providing PR. The PR is still in review. cc Langchang Zhu(zhuliangch...@baidu.com), who's the author for the

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

2019-05-06 Thread Russell Spitzer

Actually i just checked the release, they only changed the pyspark part. So the download on the website will still be 2.12 so you'll need to build the scala 2.11 version of Spark if you want to use the connector. Or Submit a PR for scala 2.12 support On Mon, May 6, 2019 at 9:21 PM Russell Spitzer

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

2019-05-06 Thread Russell Spitzer

Spark 2.4.2 was incorrectly released with the default package binaries set to Scala 2.12 instead of scala 2.11.12 which was supposed to be the case. See the 2.4.3 vote

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

2019-05-06 Thread Richard Xin

Thanks for the reply. Unfortunately this is the highest version available for Cassandra connector. One thing I don’t quite understand is that it worked perfectly under Spark 2.4.0. I thought support for Scala 2.11 only became deprecated starting spark 2.4.1, will be removed after spark 3.0

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

2019-05-06 Thread Russell Spitzer

Scala version mismatched Spark is shown at 2.12, the connector only has a 2.11 release On Mon, May 6, 2019, 7:59 PM Richard Xin wrote: > > org.apache.spark > spark-core_2.12 > 2.4.0 > compile > > > org.apache.spark > spark-sql_2.12 > 2.4.0 > > >

spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

2019-05-06 Thread Richard Xin

org.apache.spark spark-core_2.12 2.4.0 compile org.apache.spark spark-sql_2.12 2.4.0 com.datastax.spark spark-cassandra-connector_2.11 2.4.1 I run spark-submit I got following exceptions on Spark 2.4.2, it works fine when running spark-submit under

Re: Performance Decrease in spark

2019-05-06 Thread Gourav Sengupta

Hi, can you please share you code? Regards, Gourav On Mon, May 6, 2019 at 8:28 AM yuvraj singh <19yuvrajsing...@gmail.com> wrote: > Hi all , > > We moved from spark 2.1.2 to 2.3.3 and all our mysql query went very slow . > > please help me on this . > > Thanks > Yubraj Singh > > > [image:

Re: Dynamic metric names

2019-05-06 Thread Sergey Zhemzhitsky

Hi Saisai, Thanks a lot for the link! This is exactly what I need. Just curious, why this PR has not been merged, as it seems to implement rather natural requirement. There are a number or use cases which can benefit from this feature, e.g. - collecting business metrics based on the data's

Image Grep

2019-05-06 Thread swastik mittal

My spark driver program reads multiple images from hdfs and searches for a particular image using image name. If it finds the image, It converts the received byte array of the image back to its original form. But the image I get on conversion is showing corrupted image. I am using ImageSchema to

Re: Spark structured streaming watermarks on nested attributes

2019-05-06 Thread Joe Ammann

On 5/6/19 6:23 PM, Pat Ferrel wrote: > Streams have no end until watermarked or closed. Joins need bounded datasets, > et voila. Something tells me you should consider the streaming nature of your > data and whether your joins need to use increments/snippets of infinite > streams or to re-join

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Gourav Sengupta

Hi Andrew, Do not misrepresent my statements. I mentioned it depends on the used case, I NEVER (note the word "never") mentioned that Pandas UDF is ALWAYS (note the word "always") slow. Regards, Gourav Sengupta On Mon, May 6, 2019 at 6:00 PM Andrew Melo wrote: > Hi, > > On Mon, May 6, 2019 at

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Gourav Sengupta

Hence, what I mentioned initially does sound correct ? On Mon, May 6, 2019 at 5:43 PM Andrew Melo wrote: > Hi, > > On Mon, May 6, 2019 at 11:41 AM Patrick McCarthy > wrote: > > > > Thanks Gourav. > > > > Incidentally, since the regular UDF is row-wise, we could optimize that > a bit by taking

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Andrew Melo

Hi, On Mon, May 6, 2019 at 11:59 AM Gourav Sengupta wrote: > > Hence, what I mentioned initially does sound correct ? I don't agree at all - we've had a significant boost from moving to regular UDFs to pandas UDFs. YMMV, of course. > > On Mon, May 6, 2019 at 5:43 PM Andrew Melo wrote: >> >>

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Andrew Melo

Hi, On Mon, May 6, 2019 at 11:41 AM Patrick McCarthy wrote: > > Thanks Gourav. > > Incidentally, since the regular UDF is row-wise, we could optimize that a bit > by taking the convert() closure and simply making that the UDF. > > Since there's that MGRS object that we have to create too, we

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Patrick McCarthy

Thanks Gourav. Incidentally, since the regular UDF is row-wise, we could optimize that a bit by taking the convert() closure and simply making that the UDF. Since there's that MGRS object that we have to create too, we could probably optimize it further by applying the UDF via rdd.mapPartitions,

Re: Spark structured streaming watermarks on nested attributes

2019-05-06 Thread Pat Ferrel

Streams have no end until watermarked or closed. Joins need bounded datasets, et voila. Something tells me you should consider the streaming nature of your data and whether your joins need to use increments/snippets of infinite streams or to re-join the entire contents of the streams accumulated

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Gourav Sengupta

The proof is in the pudding :) On Mon, May 6, 2019 at 2:46 PM Gourav Sengupta wrote: > Hi Patrick, > > super duper, thanks a ton for sharing the code. Can you please confirm > that this runs faster than the regular UDF's? > > Interestingly I am also running same transformations using another

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Gourav Sengupta

Hi Patrick, super duper, thanks a ton for sharing the code. Can you please confirm that this runs faster than the regular UDF's? Interestingly I am also running same transformations using another geo spatial library in Python, where I am passing two fields and getting back an array. Regards,

Spark structured streaming watermarks on nested attributes

2019-05-06 Thread Joe Ammann

Hi all I'm pretty new to Spark and implementing my first non-trivial structured streaming job with outer joins. My environment is a Hortonworks HDP 3.1 cluster with Spark 2.3.2, working with Python. I understood that I need to provide watermarks and join conditions for left outer joins to

Re: Dynamic metric names

2019-05-06 Thread Saisai Shao

I remembered there was a PR about doing similar thing ( https://github.com/apache/spark/pull/18406). From my understanding, this seems like a quite specific requirement, it may requires code change to support your needs. Thanks Saisai Sergey Zhemzhitsky 于2019年5月4日周六下午4:44写道： > Hello Spark

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Patrick McCarthy

Human time is considerably more expensive than computer time, so in that regard, yes :) This took me one minute to write and ran fast enough for my needs. If you're willing to provide a comparable scala implementation I'd be happy to compare them. @F.pandas_udf(T.StringType(),

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

2019-05-06 Thread Gourav Sengupta

And you found the PANDAS UDF more performant ? Can you share your code and prove it? On Sun, May 5, 2019 at 9:24 PM Patrick McCarthy wrote: > I disagree that it's hype. Perhaps not 1:1 with pure scala > performance-wise, but for python-based data scientists or others with a lot > of python

Re: Deep Learning with Spark, what is your experience?

2019-05-06 Thread Gourav Sengupta

the main concern is around the model and its accuracy, and then fitting all those CI/CD hype around it. On Sun, May 5, 2019 at 10:37 PM Riccardo Ferrari wrote: > Thanks everyone, I really appreciate your contributions here. > > @Jason, thanks for the references I'll take a look. Quickly

Performance Decrease in spark

2019-05-06 Thread yuvraj singh

Hi all , We moved from spark 2.1.2 to 2.3.3 and all our mysql query went very slow . please help me on this . Thanks Yubraj Singh [image: Mailtrack] Sender notified by Mailtrack

Re: Spark structured streaming watermarks on nested attributes

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

Re: spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

spark-cassandra-connector_2.1 caused java.lang.NoClassDefFoundError under Spark 2.4.2?

Re: Performance Decrease in spark

Re: Dynamic metric names

Image Grep

Re: Spark structured streaming watermarks on nested attributes

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Spark structured streaming watermarks on nested attributes

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Spark structured streaming watermarks on nested attributes

Re: Dynamic metric names

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Anaconda installation with Pyspark/Pyarrow (2.3.0+) on cloudera managed server

Re: Deep Learning with Spark, what is your experience?

Performance Decrease in spark

24 matches

Site Navigation

Mail list logo

Footer information