Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Maxim Gekk
Thank you, Chao! On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim wrote: > Thanks Chao for driving the release! > > On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan wrote: > >> Thanks, Chao! >> >> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun wrote: >> >>> We are happy to announce the availability of

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Maxim Gekk
Congratulations everyone with the new release, and thanks to Yuming for his efforts. Maxim Gekk Software Engineer Databricks, Inc. On Wed, Oct 26, 2022 at 10:14 AM Hyukjin Kwon wrote: > Thanks, Yuming. > > On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh wrote: > >> Tha

Re: Apache Spark

2021-01-26 Thread Maxim Gekk
Hi Андрей, You can write to https://databricks.com/company/contact . Probably, we can offer something to you. For instance, Databricks has OEM program which might be interesting to you: https://partners.databricks.com/prm/English/c/Overview Maxim Gekk Software Engineer Databricks, Inc

Re: Spark 3.0.1 new Proleptic Gregorian calendar

2020-11-19 Thread Maxim Gekk
cannot determine who wrote the parquet files and which calendar was used while saving the files. Starting from the version 2.4.6, Spark saves meta-data to parquet files, and Spark 3.0 can infer the mode automatically. Maxim Gekk Software Engineer Databricks, Inc. On Thu, Nov 19, 2020 at 8:10 PM S

Re: Spark 3.0 almost 1000 times slower to read json than Spark 2.4

2020-06-29 Thread Maxim Gekk
Hello Sanjeev, It is hard to troubleshoot the issue without input files. Could you open an JIRA ticket at https://issues.apache.org/jira/projects/SPARK and attach the JSON files there (or samples or code which generates JSON files)? Maxim Gekk Software Engineer Databricks, Inc. On Mon, Jun

Re: Better way to debug serializable issues

2020-02-18 Thread Maxim Gekk
while creating a cluster: spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true spark.executor.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true Maxim Gekk Software Engineer Databricks, Inc. On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li wrote: > Hi all, >

Re: How to access line fileName in loading file using the textFile method

2018-09-24 Thread Maxim Gekk
; using sc.textFile("path/*"), how can I understand each data is for which > file? > > Is it possible (and needed) to customize the textFile method? > -- Maxim Gekk Technical Solutions Lead Databricks Inc. maxim.g...@databricks.com databricks.com <http://databricks.com/>

Re: How to read multiple libsvm files in Spark?

2018-09-24 Thread Maxim Gekk
Hi, > Any other alternatives? Manually form the input path by combining multiple paths via dots. See https://issues.apache.org/jira/browse/SPARK-12086 On Thu, Sep 20, 2018 at 12:47 PM Md. Rezaul Karim < rezaul.ka...@insight-centre.org> wrote: > I'm experiencing "Exception in thread "main"

Re: from_json function

2018-08-15 Thread Maxim Gekk
;data.number", "data._corrupt_record") > > jsonedDf.show() > ``` > Does anybody can help me get `_corrupt_record` non empty? > > Thanks in advance. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Maxim Gekk Technical Solutions Lead Databricks Inc. maxim.g...@databricks.com databricks.com <http://databricks.com/>