Re: Manually reading parquet files.

2019-03-21 Thread Ryan Blue
doopConfWithOptions(relation.options)) > ) > > *import *scala.collection.JavaConverters._ > > *val *rows = readFile(pFile).flatMap(_ *match *{ > *case *r: InternalRow => *Seq*(r) > > // This doesn't work. vector mode is doing something screwy > *case *b: ColumnarBatch => b.rowIterator().asScala > }).toList > > *println*(rows) > //List([0,1,5b,24,66647361]) > //??this is wrong I think > > > > Has anyone attempted something similar? > > > > Cheers Andrew > > > -- Ryan Blue Software Engineer Netflix

Re: DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-05 Thread Ryan Blue
get(0, > DataTypes.DateType)); > > } > > It prints an integer as output: > > MyDataWriter.write: 17039 > > > Is this a bug? or I am doing something wrong? > > Thanks, > Shubham > -- Ryan Blue Software Engineer Netflix

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-19 Thread Ryan Blue
elson, Assaf >>> wrote: >>> >>> Could you add a fuller code example? I tried to reproduce it in my >>> environment and I am getting just one instance of the reader… >>> >>> >>> >>> Thanks, >>> >>> Assaf >

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-03 Thread Ryan Blue
or shouldn't > come. Let me know if this understanding is correct > > On Tue, May 1, 2018 at 9:37 PM, Ryan Blue <rb...@netflix.com> wrote: > >> This is usually caused by skew. Sometimes you can work around it by in >> creasing the number of partitions like you tri

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-01 Thread Ryan Blue
kFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:419) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:349) > > -- Ryan Blue Software Engineer Netflix

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-02 Thread Ryan Blue
. >> memoryOverhead. >> >> Driver memory=4g, executor mem=12g, num-executors=8, executor core=8 >> >> Do you think below setting can help me to overcome above issue: >> >> spark.default.parellism=1000 >> spark.sql.shuffle.partitions=1000 >> >> Because default max number of partitions are 1000. >> >> >> > -- Ryan Blue Software Engineer Netflix

Re: What is correct behavior for spark.task.maxFailures?

2017-04-24 Thread Ryan Blue
for a stage. In that version, you probably want to set spark.blacklist.task.maxTaskAttemptsPerExecutor. See the settings docs <http://spark.apache.org/docs/latest/configuration.html> and search for “blacklist” to see all the options. rb ​ On Mon, Apr 24, 2017 at 9:41 AM, Ryan Blue <rb...@netflix.c

Re: What is correct behavior for spark.task.maxFailures?

2017-04-24 Thread Ryan Blue
> > > Regards > Sumit Chawla > > -- Ryan Blue Software Engineer Netflix

Re: Driver hung and happend out of memory while writing to console progress bar

2017-02-10 Thread Ryan Blue
progress" > java.lang.OutOfMemoryError: Java heap space at > java.util.Arrays.copyOfRange(Arrays.java:3664) at > java.lang.String.(String.java:207) at > java.lang.StringBuilder.toString(StringBuilder.java:407) at > scala.collection.mutable.StringBuilder.toString(StringBuilder.scala:430) > at org.apache.spark.ui.ConsoleProgressBar.show(ConsoleProgressBar.scala:101) > at > org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:71) > at > org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:55) > at java.util.TimerThread.mainLoop(Timer.java:555) at > java.util.TimerThread.run(Timer.java:505) > > -- Ryan Blue Software Engineer Netflix

Re: Apache Hive with Spark Configuration

2017-01-03 Thread Ryan Blue
astore, can you tell me which > version is more compatible with Spark 2.0.2 ? > > THanks > -- Ryan Blue Software Engineer Netflix

Re: Do we need schema for Parquet files with Spark?

2016-03-04 Thread Ryan Blue
know dictionary of words > if > >> there is no schema provided by user? Where/how to specify my schema / > >> config for Parquet format? > >> > >> Could not find Apache Parquet mailing list in the official site. It > would > >> be great if anyone could share it as well. > >> > >> Regards > >> Ashok > >> > > > > > -- Ryan Blue Software Engineer Netflix