Re: [pyspark] Load a master data file to spark ecosystem

2020-04-24 Thread Sonal Goyal
How does your tree_lookup_value function work? Thanks, Sonal Nube Technologies On Fri, Apr 24, 2020 at 8:47 PM Arjun Chundiran wrote: > Hi Team, > > I have asked this question in stack overflow >

Watch "Airbus makes more of the sky with Spark - Jesse Anderson & Hassene Ben Salem" on YouTube

2020-04-24 Thread Zahid Rahman
https://youtu.be/sYlbD_OoHhs Backbutton.co.uk ¯\_(ツ)_/¯ ♡۶Java♡۶RMI ♡۶ Make Use Method {MUM} makeuse.org

Re: [Meta] Moderation request diversion?

2020-04-24 Thread Jeff Evans
Thanks, Sean; much appreciated. On Fri, Apr 24, 2020 at 1:09 PM Sean Owen wrote: > The mailing lists are operated by the ASF. I've asked whether it's > possible here: https://issues.apache.org/jira/browse/INFRA-20186 > > On Fri, Apr 24, 2020 at 12:39 PM Jeff Evans > wrote: > > > > Still

Re: [Meta] Moderation request diversion?

2020-04-24 Thread Sean Owen
The mailing lists are operated by the ASF. I've asked whether it's possible here: https://issues.apache.org/jira/browse/INFRA-20186 On Fri, Apr 24, 2020 at 12:39 PM Jeff Evans wrote: > > Still noticing this problem quite a bit, both on the user and dev lists. I > notice that it appears to be

Re: [Meta] Moderation request diversion?

2020-04-24 Thread Jeff Evans
Still noticing this problem quite a bit, both on the user and dev lists. I notice that it appears to be using ezmlm as the software. Is there any chance the list owner (someone at Databricks?) can take a look at restricting messages based on seeing the word "Unsubscribe" in the subject? It

unsubscribe

2020-04-24 Thread vijay krishna

[pyspark] Load a master data file to spark ecosystem

2020-04-24 Thread Arjun Chundiran
Hi Team, I have asked this question in stack overflow and I didn't really get any convincing answers. Can somebody help me to solve this issue? Below is my problem While building a log processing system, I

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-24 Thread Waleed Fateem
Are you running this in local mode? If not, are you even sure that the hanging is occurring on the driver's side? Did you check the Spark UI to see if there is a straggler task or not? If you do have a straggler/hanging task, and in case this is not an application running in local mode then you

[Structured Streaming] Event-Time ordering of two Kafka topics with different message volumes

2020-04-24 Thread eiise
Hello, we are currently struggling to bring order into a merge of two streaming data sets from different (keyed -> so we can assume an order per partition/key) Kafka sources, with 4 partitions each, using the Java API. On both topics we receive messages that contain varying field-subsets of our

Re: Spark ORC store written timestamp as column

2020-04-24 Thread ZHANG Wei
>From what I think I understand, the OrcOutputWriter leverages orc-core to write. I'm wondering if ORC supports the row metadata or not. If not, maybe the org.apache.orc.Writer::addRowBatch() can be overrided to record the metadata after RowBatch written. -- Cheers, -z On Thu, 16 Apr 2020

Why when writing Parquet files, columns are converted to nullable?

2020-04-24 Thread Julien Benoit
Hi, Spark documentation says: "When writing Parquet files, all columns are automatically converted to be nullable for compatibility reasons." Could you elaborate on the reasons for this choice? Is this for a similar reason as Protobuf which gets rid of "required" fields in version 3, since

Re: 30000 partitions vs 1000 partitions with Coalescing

2020-04-24 Thread Roland Johann
Hi Adnan, coalescing involves network shuffle to other executors. How many executors are configured for that job? Best regards Roland Johann Software Developer/Data Engineer phenetic GmbH Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io Web:

Re: Save Spark dataframe as dynamic partitioned table in Hive

2020-04-24 Thread ZHANG Wei
AFAICT, we can use spark.sql(s"select $name ..."), name is a value in Scala context[1]. -- Cheers, -z [1] https://docs.scala-lang.org/overviews/core/string-interpolation.html On Fri, 17 Apr 2020 00:10:59 +0100 Mich Talebzadeh wrote: > Thanks Patrick, > > The partition broadcastId is static