Re: Using Avro file format with SparkSQL

2022-02-09 Thread frakass
Have you added the dependency in the build.sbt? Can you 'sbt package' the source successfully? regards frakass On 2022/2/10 11:25, Karanika, Anna wrote: For context, I am invoking spark-submit and adding arguments --packages org.apache.spark:spark-avro_2.12:3.2.0.

Using Avro file format with SparkSQL

2022-02-09 Thread Karanika, Anna
Hello, I have been trying to use spark SQL’s operations that are related to the Avro file format, e.g., stored as, save, load, in a Java class but they keep failing with the following stack trace: Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source:

Re: StructuredStreaming - foreach/foreachBatch

2022-02-09 Thread karan alang
Thanks, Mich .. will check it out regds, Karan Alang On Tue, Feb 8, 2022 at 3:06 PM Mich Talebzadeh wrote: > BTW you can check this Linkedin article of mine on Processing Change Data > Capture with Spark Structured Streaming >

Re: Unsubscribe

2022-02-09 Thread Bitfox
Please send an e-mail: user-unsubscr...@spark.apache.org to unsubscribe yourself from the mailing list. On Thu, Feb 10, 2022 at 1:38 AM Yogitha Ramanathan wrote: >

Unsubscribe

2022-02-09 Thread Yogitha Ramanathan

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Mich Talebzadeh
What is the issue you are encountering? Memory bound? Is it GCP nodes (are you using a Dataproc cluster). Have you checked the logs in GCP How about Spark GUI, what does it say? With two nodes of cluster you sound like you are doing more of

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Andrew Davidson
Hi Sean I have 2 big for loops in my code. One for loop uses join to implement R’s cbind() the other implements R’s rowsum(). Each for loop iterates 10411 times. It debug I added an action to each iteration and of the loop. I think I used count() and logged the results. So I am confident

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Sean Owen
It really depends on what is running out of memory. You can have all the workers in the world but if something is blowing up the driver, won't do anything. You can have a huge cluster but data skew makes it impossible to break up the problem you express. Spark running out of mem is not the same as

Re: Does spark have something like rowsum() in R?

2022-02-09 Thread Andrew Davidson
Hi Sean Debugging big data projects is always hard. It is a black art that takes a lot of experience. Can you tell me more about “Why you're running out of mem is probably more a function of your parallelism, cluster size” ? I have cluster with 2 worker nodes. Each with 1.4 TB of memory , 96

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Bitfox
Hi I am not sure about the total situation. But if you want a scala integration I think it could use regex to match and capture the keywords. Here I wrote one you can modify by your end. import scala.io.Source import scala.collection.mutable.ArrayBuffer val list1 =

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
Hello, how are you? Thanks for your time > Does the data contain records? Yes > Are the records "homogenous" ; ie; do they have the same fields? Yes the data is homogenous but have “two layouts” in the same file. > What is the format of the data? All data is string file .txt > Are records

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
Hello Yes, for this block I can open as csv with # delimiter, but have the block that is no csv format. This is the likely key value. We have two different layouts in the same file. This is the “problem”. Thanks for your time. > Relação de Beneficiários Ativos e Excluídos > Carteira

Re: flatMap for dataframe

2022-02-09 Thread Khalid Mammadov
One way is to split->explode->pivot These are column and Dataframe methods. Here are quick examples from web: https://www.google.com/amp/s/sparkbyexamples.com/spark/spark-split-dataframe-column-into-multiple-columns/amp/