Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
@Alfie Davidson : Awesome, it worked with "“org.elasticsearch.spark.sql”" But as soon as I switched to *elasticsearch-spark-20_2.12, *"es" also worked. On Fri, Sep 8, 2023 at 12:45 PM Dipayan Dev wrote: > > Let me try that and get back. Just wondering, if there a ch

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
t; > Sent from my iPhone > > On 8 Sep 2023, at 03:10, Dipayan Dev wrote: > >  > > ++ Dev > > On Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev > wrote: > >> Hi, >> >> Can you please elaborate your last response? I don’t have any external >> depende

Re: Elasticsearch support for Spark 3.x

2023-09-07 Thread Dipayan Dev
s also not somehow already provided by your spark cluster (this > is what it means), then yeah this is not anywhere on the class path at > runtime. Remove the provided scope. > > On Thu, Sep 7, 2023, 4:09 PM Dipayan Dev wrote: > >> Hi, >> >> Can you please elabora

Re: Elasticsearch support for Spark 3.x

2023-09-07 Thread Dipayan Dev
n, Aug 27, 2023 at 2:58 PM Dipayan Dev > wrote: > >> Using the following dependency for Spark 3 in POM file (My Scala version >> is 2.12.14) >> >> >> >> >> >> >> *org.elasticsearch >> elasticsearch-spark-30_2.12 >> 7.12.0

Re: Elasticsearch support for Spark 3.x

2023-09-07 Thread Dipayan Dev
++ Dev On Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev wrote: > Hi, > > Can you please elaborate your last response? I don’t have any external > dependencies added, and just updated the Spark version as mentioned below. > > Can someone help me with this? > > On Fri, 1 Se

Re: Elasticsearch support for Spark 3.x

2023-08-27 Thread Dipayan Dev
ndex_name") The same code is working with Spark 2.4.0 and the following dependency *org.elasticsearch elasticsearch-spark-20_2.12 7.12.0* On Mon, 28 Aug 2023 at 12:17 AM, Holden Karau wrote: > What’s the version of the ES connector you are using? > > On Sat, Aug 26, 2023 at

Elasticsearch support for Spark 3.x

2023-08-26 Thread Dipayan Dev
Hi All, We're using Spark 2.4.x to write dataframe into the Elasticsearch index. As we're upgrading to Spark 3.3.0, it throwing out error Caused by: java.lang.ClassNotFoundException: es.DefaultSource at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) at

Unsubscribe

2023-08-25 Thread Dipayan Dev

Unsubscribe

2023-08-23 Thread Dipayan Dev
Unsubscribe

Unsubscribe

2023-08-21 Thread Dipayan Dev
-- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-Dev/dp/1787124762>* M.Tech (AI), IISc, Bangalore

Spark doesn’t create SUCCESS file when external path is passed

2023-08-21 Thread Dipayan Dev
on the SUCCESS file. Please let me know if this is a bug or I need to any additional configuration to fix this in Spark 3.3.0. Happy to contribute if you suggest. -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-

Re: Probable Spark Bug while inserting into flat GCS bucket?

2023-08-20 Thread Dipayan Dev
Hi Mich, It's not specific to ORC, and looks like a bug from Hadoop Common project. I have raised a bug and am happy to contribute to Hadoop 3.3.0 version. Do you know if anyone could help me to set the Assignee? https://issues.apache.org/jira/browse/HADOOP-18856 With Best Regards, Dipayan Dev

Probable Spark Bug while inserting into flat GCS bucket?

2023-08-19 Thread Dipayan Dev
;).map(x => x) DF.write.option("path", "gs://test_dd1/abc/").mode(SaveMode.Overwrite).partitionBy(partKey: _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb2") val DF1 = Seq(("test2", 125)).toDF("name", "num") DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb2") With Best Regards, Dipayan Dev

[no subject]

2023-08-18 Thread Dipayan Dev
Unsubscribe -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-Dev/dp/1787124762>* M.Tech (AI), IISc, Bangalore

Re: Spark File Output Committer algorithm for GCS

2023-07-21 Thread Dipayan Dev
rious what this actually does? With Best Regards, Dipayan Dev On Wed, Jul 19, 2023 at 2:25 PM Dipayan Dev wrote: > Thank you. Will try out these options. > > > > With Best Regards, > > > > On Wed, Jul 19, 2023 at 1:40 PM Mich Talebzadeh > wrote: > >>

Re: Spark File Output Committer algorithm for GCS

2023-07-19 Thread Dipayan Dev
esponsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction.

Re: Spark File Output Committer algorithm for GCS

2023-07-18 Thread Dipayan Dev
as it deletes and copies the partitions. My issue is something related to this - https://groups.google.com/g/cloud-dataproc-discuss/c/neMyhytlfyg?pli=1 With Best Regards, Dipayan Dev On Wed, Jul 19, 2023 at 12:06 AM Mich Talebzadeh wrote: > Spark has no role in creating that hive stag

Re: Spark File Output Committer algorithm for GCS

2023-07-18 Thread Dipayan Dev
at 9:47 PM, Dipayan Dev wrote: > Thanks Jay, is there any suggestion how much I can increase those > parameters? > > On Mon, 17 Jul 2023 at 8:25 PM, Jay wrote: > >> Fileoutputcommitter v2 is supported in GCS but the rename is a metadata >> copy and delete ope

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
updates in Spark. >> >> >> On Mon, 17 Jul 2023 at 7:05 PM, Jay wrote: >> >>> You can try increasing fs.gs.batch.threads and >>> fs.gs.max.requests.per.batch. >>> >>> The definitions for these flags are available here - >>> https://gith

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
ing fs.gs.batch.threads and > fs.gs.max.requests.per.batch. > > The definitions for these flags are available here - > https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/CONFIGURATION.md > > On Mon, 17 Jul 2023 at 14:59, Dipayan Dev wrote: > >> No, I am using Spark 2.

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
on dates and I need to update around 3 years of data. It usually takes 3 hours to finish the process. Anyway to speed up this? With Best Regards, Dipayan Dev On Mon, Jul 17, 2023 at 1:53 PM Mich Talebzadeh wrote: > So you are using GCP and your Hive is installed on Dataproc which happens >

Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
and cons of using this version? Or any ongoing Spark feature development to address this issue? With Best Regards, Dipayan Dev

Contributing to Spark MLLib

2023-07-16 Thread Dipayan Dev
? Is there any new features in line and the best way to explore this? Looking forward to little guidance to start with. Thanks Dipayan -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-Dev/dp/1787124762>* M.Tech (AI)