Re: how to add columns to row when column has a different encoder?

2018-02-28 Thread David Capwell
interface to add such a UDF? Thanks for your help! On Mon, Feb 26, 2018, 3:50 PM David Capwell <dcapw...@gmail.com> wrote: > I have a row that looks like the following pojo > > case class Wrapper(var id: String, var bytes: Array[Byte]) > > Those bytes are a serialized

Re: SizeEstimator

2018-02-27 Thread David Capwell
n a different stage). Make sure number of executors is small (for example only one) else you are reducing the size of M for each executor. On Mon, Feb 26, 2018, 10:04 PM 叶先进 <advance...@gmail.com> wrote: > What type is for the buffer you mentioned? > > > On 27 Feb 2018, at

Re: SizeEstimator

2018-02-26 Thread David Capwell
rote: > Thanks David. Another solution is to convert the protobuf object to byte > array, It does speed up SizeEstimator > > On Mon, Feb 26, 2018 at 5:34 PM, David Capwell <dcapw...@gmail.com> wrote: > >> This is used to predict the current cost of memory so spark kno

Re: SizeEstimator

2018-02-26 Thread David Capwell
This is used to predict the current cost of memory so spark knows to flush or not. This is very costly for us so we use a flag marked in the code as private to lower the cost spark.shuffle.spill.numElementsForceSpillThreshold (on phone hope no typo) - how many records before flush This lowers

how to add columns to row when column has a different encoder?

2018-02-26 Thread David Capwell
I have a row that looks like the following pojo case class Wrapper(var id: String, var bytes: Array[Byte]) Those bytes are a serialized pojo that looks like this case class Inner(var stuff: String, var moreStuff: String) I right now have encoders for both the types, but I don't see how to

Re: Encoder with empty bytes deserializes with non-empty bytes

2018-02-21 Thread David Capwell
ent :: Nil) This causes the java code to see a byte[] which uses a different code path than linked. Since I did ArrayType(ByteTyep) I had to wrap the data in a ArrayData class On Wed, Feb 21, 2018 at 9:55 PM, David Capwell <dcapw...@gmail.com> wrote: > I am trying to create a Encoder for

Encoder with empty bytes deserializes with non-empty bytes

2018-02-21 Thread David Capwell
I am trying to create a Encoder for protobuf data and noticed something rather weird. When we have a empty ByteString (not null, just empty), when we deserialize we get back a empty array of length 8. I took the generated code and see something weird going on. UnsafeRowWriter 1. public

Dynamic Accumulators in 2.x?

2017-10-11 Thread David Capwell
I wrote a spark instrumentation tool that instruments RDDs to give more fine-grain details on what is going on within a Task. This is working right now, but uses volatiles and CAS to pass around this state (which slows down the task). We want to lower the overhead of this and make the main call

add jars to spark's runtime

2017-10-11 Thread David Capwell
We want to emit the metrics out of spark into our own custom store. To do this we built our own sink and tried to add it to spark by doing --jars path/to/jar and defining the class in metrics.properties which is supplied with the job. We noticed that spark kept crashing with the below exception