[Structured streaming, V2] commit on ContinuousReader

2018-05-03 Thread Jiří Syrový
Version: 2.3, DataSourceV2, ContinuousReader Hi, We're creating a new data source to fetch data from streaming source that requires commiting received data and we would like to commit data once in a while after it has been retrieved and correctly processed and then fetch more. One option could

Re: java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

2018-03-28 Thread Jiří Syrový
Quick comment: Excel CSV (very special case though) supports arrays in CSV using "\n" inside quotes, but you have to use as EOL for the row "\r\n" (Windows EOL). Cheers, Jiri 2018-03-28 14:14 GMT+02:00 Yong Zhang : > Your dataframe has array data type, which is NOT

org.apache.spark.ui.jobs.UIData$TaskMetricsUIData

2017-03-17 Thread Jiří Syrový
Hi, is there a good way how to get rid of UIData completely? I have switched off UI, decreased retainedXXX to minimum, but still there seems to be a lot of instances of this class ($SUBJ) held in memory. Any ideas? Thanks, J. S. spark { master = "local[2]" master = ${?SPARK_MASTER} info =

Re: Dependency Injection and Microservice development with Spark

2017-01-04 Thread Jiří Syrový
Hi, another nice approach is to use instead of it Reader monad and some framework to support this approach (e.g. Grafter - https://github.com/zalando/grafter). It's lightweight and helps a bit with dependencies issues. 2016-12-28 22:55 GMT+01:00 Lars Albertsson : > Do you

Re: Can't zip RDDs with unequal numbers of partitions

2016-03-19 Thread Jiří Syrový
hu, Mar 17, 2016 at 10:03 AM, Jiří Syrový <syrovy.j...@gmail.com> > wrote: > > Hi, > > > > any idea what could be causing this issue? It started appearing after > > changing parameter > > > > spark.sql.autoBroadcastJoinThreshold to 10 > > >

Can't zip RDDs with unequal numbers of partitions

2016-03-18 Thread Jiří Syrový
Hi, any idea what could be causing this issue? It started appearing after changing parameter *spark.sql.autoBroadcastJoinThreshold to 10* Caused by: java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions at

Re: jobs much slower in cluster mode vs local

2016-01-15 Thread Jiří Syrový
Hi, you can try to use spark job server and submit jobs to it. The thing is that the most expensive part is context creation. J. 2016-01-15 15:28 GMT+01:00 : > Hello, > > In general, I am usually able to run spark submit jobs in local mode, in a > 32-cores node

Re: FileNotFoundException in appcache shuffle files

2015-12-10 Thread Jiří Syrový
Usually there is another error or log message before FileNotFoundException. Try to check your logs for something like that. 2015-12-10 10:47 GMT+01:00 kendal : > I have similar issues... Exception only with very large data. > And I tried to double the memory or partition as

Re: Spark SQL - saving to multiple partitions in parallel - FileNotFoundException on _temporary directory possible bug?

2015-12-08 Thread Jiří Syrový
Hi, I have a very similar issue on standalone SQL context, but when using save() instead. I guess it might be related to https://issues.apache.org/jira/browse/SPARK-8513. Also it usually happens after using groupBy. Regrads, Jiri 2015-12-08 0:16 GMT+01:00 Deenar Toraskar

Fwd: UnresolvedException - lag, window

2015-11-05 Thread Jiří Syrový
Hi, I'm getting the following exception with Spark 1.5.2-rc2 (haven't tried 1.6.0 yet though) when using window function lag: [2015-11-05 10:58:50,806] ERROR xo.builder.jobs.CompareJob [] [akka://JobServer/user/context-supervisor/MYCONTEXT] - Comparison has failed