End of Stream errors in shuffle

2018-01-15 Thread Fernando Pereira
Hi, I'm facing a very strange error that occurs halfway of long execution Spark SQL jobs: 18/01/12 22:14:30 ERROR Utils: Aborting task java.io.EOFException: reached end of stream after reading 0 bytes; 96 bytes expected at org.spark_project.guava.io.ByteStreams.readFully(ByteStreams.java:735) at

Re: Dynamic data ingestion into SparkSQL - Interesting question

2017-11-21 Thread Fernando Pereira
Did you consider do string processing to build the SQL expression which you can execute with spark.sql(...)? Some examples: https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Cheers On 21 November 2017 at 03:27, Aakash Basu wrote: > Hi all,

Re: Multiple transformations without recalculating or caching

2017-11-17 Thread Fernando Pereira
don't you so that and then read it again and get > your stats? > > On Fri, 17 Nov 2017, 10:03 Fernando Pereira, <ferdonl...@gmail.com> wrote: > >> Dear Spark users >> >> Is it possible to take the output of a transformation (RDD/Dataframe) and >> feed it to

Multiple transformations without recalculating or caching

2017-11-17 Thread Fernando Pereira
Dear Spark users Is it possible to take the output of a transformation (RDD/Dataframe) and feed it to two independent transformations without recalculating the first transformation and without caching the whole dataset? Consider the case of a very large dataset (1+TB) which suffered several