Davies, that seemed to be my issue, my colleague helped me to resolved it. The problem was that we build RDD<Row> and corresponding StructType by ourselves (no json, parquet, cassandra, etc - we take a list of business objects and convert them to Rows, then infer struct type) and I missed one thing. -- Be well! Jean Morozov
On Tue, Oct 6, 2015 at 1:58 AM, Davies Liu <dav...@databricks.com> wrote: > Could you tell us a way to reproduce this failure? Reading from JSON or > Parquet? > > On Mon, Oct 5, 2015 at 4:28 AM, Eugene Morozov > <evgeny.a.moro...@gmail.com> wrote: > > Hi, > > > > We're building our own framework on top of spark and we give users pretty > > complex schema to work with. That requires from us to build dataframes by > > ourselves: we transform business objects to rows and struct types and > uses > > these two to create dataframe. > > > > Everything was fine until I started to upgrade to spark 1.5.0 (from > 1.3.1). > > Seems to be catalyst engine has been changed and now using almost the > same > > code to produce rows and struct types I have the following: > > http://ibin.co/2HzUsoe9O96l, some of rows in the end result have > different > > number of values and corresponding struct types. > > > > I'm almost sure it's my own fault, but there is always a small chance, > that > > something is wrong in spark codebase. If you've seen something similar > or if > > there is a jira for smth similar, I'd be glad to know. Thanks. > > -- > > Be well! > > Jean Morozov >