Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Aakash Basu
Hey Shuporno, Thanks for a prompt reply. Thanks for noticing the silly mistake, I tried this out, but still getting another error, which is related to connectivity it seems. >>> hadoop_conf.set("fs.s3a.awsAccessKeyId", "abcd") > >>> hadoop_conf.set("fs.s3a.awsSecretAccessKey", "123abc") > >>> a

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Shuporno Choudhury
On Fri, 21 Dec 2018 at 12:47, Shuporno Choudhury < shuporno.choudh...@gmail.com> wrote: > Hi, > Your connection config uses 's3n' but your read command uses 's3a'. > The config for s3a are: > spark.hadoop.fs.s3a.access.key > spark.hadoop.fs.s3a.secret.key > > I feel this should solve the problem.

Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Aakash Basu
Hi, I am trying to connect to AWS S3 and read a csv file (running POC) from a bucket. I have s3cmd and and being able to run ls and other operation from cli. *Present Configuration:* Python 3.7 Spark 2.3.1 *JARs added:* hadoop-aws-2.7.3.jar (in sync with the hadoop version used with spark)

Re:running updates using SPARK

2018-12-20 Thread 大啊
I think Spark is a Calculation engine design for OLAP or Ad-hoc.Spark is not a traditional relational database,UPDATE need some mandatory constraint like transaction and lock. At 2018-12-21 06:05:54, "Gourav Sengupta" wrote: Hi, Is there any time soon that SPARK will support UPDATES?

Re:Re: [Spark SQL]use zstd, No enum constant parquet.hadoop.metadata.CompressionCodecName.ZSTD

2018-12-20 Thread 大啊
I think your hive table using CompressionCodecName, but your parquet-hadoop-bundle.jar in spark classpaths is not a correct version. At 2018-12-21 12:07:22, "Jiaan Geng" wrote: >I think your hive table using CompressionCodecName, but your >parquet-hadoop-bundle.jar in spark classpaths is

Re: Spark not working with Hadoop 4mc compression

2018-12-20 Thread Jiaan Geng
I think com.hadoop.compression.lzo.LzoCodec not in spark classpaths,please put suitable hadoop-lzo.jar into directory spark_home/jars/. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

Re: [Spark SQL]use zstd, No enum constant parquet.hadoop.metadata.CompressionCodecName.ZSTD

2018-12-20 Thread Jiaan Geng
I think your hive table using CompressionCodecName, but your parquet-hadoop-bundle.jar in spark classpaths is not a correct version. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: running updates using SPARK

2018-12-20 Thread Jiaan Geng
I think Spark is a Calculation engine design for OLAP or Ad-hoc.Spark is not a traditional relational database,UPDATE need some mandatory constraint like transaction and lock. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Multiple sessions in one application?

2018-12-20 Thread Jiaan Geng
This scene is rare. When you provide a web server for spark. maybe you need it. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re:Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300
Thanks a lot for the explanation Spark declare the Sink trait with package private, that's why the package looks weird, the metric system seems not intent to be extended package org.apache.spark.metrics.sink private[spark] trait Sink Make the custom sink class available on every executor

running updates using SPARK

2018-12-20 Thread Gourav Sengupta
Hi, Is there any time soon that SPARK will support UPDATES? Databricks does provide Delta which supports UPDATE, but I think that the open source SPARK does not have the UPDATE option. HIVE has been supporting UPDATES for a very very long time now, and I was thinking when would that become

Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread Marcelo Vanzin
First, it's really weird to use "org.apache.spark" for a class that is not in Spark. For executors, the jar file of the sink needs to be in the system classpath; the application jar is not in the system classpath, so that does not work. There are different ways for you to get it there, most of

Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300
Hi, Spark Users I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar It works for driver instance, but once

[Spark cluster standalone v2.4.0] - problems with reverse proxy functionnality regarding submitted applications in cluster mode and the spark history server ui

2018-12-20 Thread Cheikh_SOW
Hello, I have many spark clusters in standalone mode with 3 nodes each. One of them is in HA with 3 masters and 3 workers and everything regarding the HA is working fine. The second one is not in HA mode and we have one master and 3 workers. In both of them, I have configured the reverse proxy

Re: Problem running Spark on Kubernetes: Certificate error

2018-12-20 Thread Steven Stetzler
Hi Matt, Thank your for the help. This worked for me. For posterity: convert the data in the certificate-authority-data field of your DigitalOcean Kubernetes configuration file, which is downloaded from their site, from base64 to PEM format and save it to a file ca.crt, then submit with --conf

Re: Spark job on dataproc failing with Exception in thread "main" java.lang.NoSuchMethodError: com.googl

2018-12-20 Thread Muthu Jayakumar
The error reads as Precondition.checkArgument() method is on an incorrect parameter signature. Could you check to see how many jars (before the Uber jar), actually contain this method signature? I smell an issue with jar version conflict or similar. Thanks Muthu On Thu, Dec 20, 2018, 02:40 Mich

Re: Spark job on dataproc failing with Exception in thread "main" java.lang.NoSuchMethodError: com.googl

2018-12-20 Thread Mich Talebzadeh
Anyone in Spark user group seen this error in case? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: [SPARK SQL] Difference between 'Hive on spark' and Spark SQL

2018-12-20 Thread Jörn Franke
If you have already a lot of queries then it makes sense to look at Hive (in a recent version)+TEZ+Llap and all tables in ORC format partitioned and sorted on filter columns. That would be the most easiest way and can improve performance significantly . If you want to use Spark, eg because you