date:20220209

Re: Using Avro file format with SparkSQL

2022-02-09 Thread frakass

Have you added the dependency in the build.sbt? Can you 'sbt package' the source successfully? regards frakass On 2022/2/10 11:25, Karanika, Anna wrote: For context, I am invoking spark-submit and adding arguments --packages org.apache.spark:spark-avro_2.12:3.2.0.

Using Avro file format with SparkSQL

2022-02-09 Thread Karanika, Anna

Hello, I have been trying to use spark SQL’s operations that are related to the Avro file format, e.g., stored as, save, load, in a Java class but they keep failing with the following stack trace: Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source:

Re: StructuredStreaming - foreach/foreachBatch

2022-02-09 Thread karan alang

Thanks, Mich .. will check it out regds, Karan Alang On Tue, Feb 8, 2022 at 3:06 PM Mich Talebzadeh wrote: > BTW you can check this Linkedin article of mine on Processing Change Data > Capture with Spark Structured Streaming >

Re: Unsubscribe

2022-02-09 Thread Bitfox

Please send an e-mail: user-unsubscr...@spark.apache.org to unsubscribe yourself from the mailing list. On Thu, Feb 10, 2022 at 1:38 AM Yogitha Ramanathan wrote: >

Unsubscribe

2022-02-09 Thread Yogitha Ramanathan

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Mich Talebzadeh

What is the issue you are encountering? Memory bound? Is it GCP nodes (are you using a Dataproc cluster). Have you checked the logs in GCP How about Spark GUI, what does it say? With two nodes of cluster you sound like you are doing more of

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Andrew Davidson

Hi Sean I have 2 big for loops in my code. One for loop uses join to implement R’s cbind() the other implements R’s rowsum(). Each for loop iterates 10411 times. It debug I added an action to each iteration and of the loop. I think I used count() and logged the results. So I am confident

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Sean Owen

It really depends on what is running out of memory. You can have all the workers in the world but if something is blowing up the driver, won't do anything. You can have a huge cluster but data skew makes it impossible to break up the problem you express. Spark running out of mem is not the same as

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Andrew Davidson

Hi Sean Debugging big data projects is always hard. It is a black art that takes a lot of experience. Can you tell me more about “Why you're running out of mem is probably more a function of your parallelism, cluster size” ? I have cluster with 2 worker nodes. Each with 1.4 TB of memory , 96

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Bitfox

Hi I am not sure about the total situation. But if you want a scala integration I think it could use regex to match and capture the keywords. Here I wrote one you can modify by your end. import scala.io.Source import scala.collection.mutable.ArrayBuffer val list1 =

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa

Hello, how are you? Thanks for your time > Does the data contain records? Yes > Are the records "homogenous" ; ie; do they have the same fields? Yes the data is homogenous but have “two layouts” in the same file. > What is the format of the data? All data is string file .txt > Are records

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa

Hello Yes, for this block I can open as csv with # delimiter, but have the block that is no csv format. This is the likely key value. We have two different layouts in the same file. This is the “problem”. Thanks for your time. > Relação de Beneficiários Ativos e Excluídos > Carteira

Re: flatMap for dataframe

2022-02-09 Thread Khalid Mammadov

One way is to split->explode->pivot These are column and Dataframe methods. Here are quick examples from web: https://www.google.com/amp/s/sparkbyexamples.com/spark/spark-split-dataframe-column-into-multiple-columns/amp/

Re: Using Avro file format with SparkSQL

Using Avro file format with SparkSQL

Re: StructuredStreaming - foreach/foreachBatch

Re: Unsubscribe

Unsubscribe

Re: Does spark have something like rowsum() in R?

Re: Does spark have something like rowsum() in R?

Re: Does spark have something like rowsum() in R?

Re: Does spark have something like rowsum() in R?

Re: Help With unstructured text file with spark scala

Re: Help With unstructured text file with spark scala

Re: Help With unstructured text file with spark scala

Re: flatMap for dataframe

13 matches

Site Navigation

Mail list logo

Footer information