[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-29 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r229099482 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1115,9 +1126,38 @@ object SQLContext

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-29 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r229093388 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1115,9 +1126,38 @@ object SQLContext

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-13 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r224962460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1115,9 +1126,38 @@ object SQLContext

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-07 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r223212772 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1098,12 +1099,19 @@ object SQLContext { data: Iterator

[GitHub] spark pull request #22646: [SPARK-25654][SQL] Support for nested JavaBean ar...

2018-10-07 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22646#discussion_r223212724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1115,8 +1123,31 @@ object SQLContext

[GitHub] spark issue #22527: [SPARK-17952][SQL] Nested Java beans support in createDa...

2018-10-05 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/22527 Thanks! I created a new PR with array, list and map support. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22646: Support for nested JavaBean arrays, lists and map...

2018-10-05 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/22646 Support for nested JavaBean arrays, lists and maps in createDataFrame ## What changes were proposed in this pull request? Continuing from #22527, this PR seeks to add support

[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-10-04 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22527#discussion_r222817293 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1098,16 +1098,26 @@ object SQLContext { data: Iterator

[GitHub] spark issue #22527: [SPARK-17952][SQL] Nested Java beans support in createDa...

2018-10-03 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/22527 @ueshin Yes. I am already working on array/list support. Will add maps as well. It shouldn't require a rewrite now that the code is restructured, just new cases in pattern match. So I think

[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-10-03 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22527#discussion_r222415998 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1098,16 +1098,24 @@ object SQLContext { data: Iterator

[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-10-03 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22527#discussion_r222415649 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1098,16 +1098,24 @@ object SQLContext { data: Iterator

[GitHub] spark issue #22527: [SPARK-17952][SQL] Nested Java beans support in createDa...

2018-10-02 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/22527 I restructured the code in this commit to allow easier addition of array/list support in the future. --- - To unsubscribe

[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-10-02 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22527#discussion_r222079624 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1100,13 +1101,23 @@ object SQLContext { attrs: Seq

[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-09-23 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/22527#discussion_r219691829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1100,13 +1101,24 @@ object SQLContext { attrs: Seq

[GitHub] spark pull request #22527: [SPARK-17952][SQL] Nested Java beans support in c...

2018-09-22 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/22527 [SPARK-17952][SQL] Nested Java beans support in createDataFrame ## What changes were proposed in this pull request? When constructing a DataFrame from a Java bean, using nested beans

[GitHub] spark issue #20505: [SPARK-23251][SQL] Add checks for collection element Enc...

2018-02-15 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/20505 Yes, that is the idea. Frankly, I am not that familiar with how the compiler resolves all the implicit parameters to say confidently what is going on. But here's my take: I did

[GitHub] spark pull request #20505: [SPARK-23251][SQL] Add checks for collection elem...

2018-02-11 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/20505#discussion_r167437023 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -165,11 +165,15 @@ abstract class SQLImplicits extends

[GitHub] spark pull request #20505: [SPARK-23251][SQL] Add checks for collection elem...

2018-02-05 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/20505#discussion_r165903346 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -165,11 +165,15 @@ abstract class SQLImplicits extends

[GitHub] spark pull request #20505: [SPARK-23251][SQL] Add checks for collection elem...

2018-02-04 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/20505 [SPARK-23251][SQL] Add checks for collection element Encoders Implicit methods of `SQLImplicits` providing Encoders for collections did not check for Encoders for their elements

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

2017-06-10 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r121266969 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -258,6 +265,80 @@ class DatasetPrimitiveSuite extends

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

2017-06-10 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r121265890 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -258,6 +265,80 @@ class DatasetPrimitiveSuite extends

[GitHub] spark pull request #18009: [SPARK-18891][SQL] Support for specific Java List...

2017-06-10 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/18009#discussion_r121263319 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -28,6 +28,8 @@ case class SeqClass(s: Seq[Int

[GitHub] spark issue #16986: [SPARK-18891][SQL] Support for Scala Map collection type...

2017-06-04 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16986 So I tried to simplify the code as much as possible, removing unneeded parameters. I must admit I am not entirely sure about whether I am handling all the data types correctly but everything

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

2017-06-04 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r120010531 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -652,6 +653,299 @@ case class MapObjects

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

2017-06-04 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r120010461 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -652,6 +653,299 @@ case class MapObjects

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

2017-06-04 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r120010289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -652,6 +653,299 @@ case class MapObjects

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Scala Map collecti...

2017-06-04 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r120009967 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -652,6 +653,299 @@ case class MapObjects

[GitHub] spark issue #16986: [SPARK-18891][SQL] Support for Scala Map collection type...

2017-05-22 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16986 That was because of my other PR that just got accepted. Just a matter of appending unit tests. I resolved the conflict from browser for now. Can rebase later if merge commits

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Map collection typ...

2017-05-18 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r117363594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -329,35 +329,19 @@ object ScalaReflection extends

[GitHub] spark pull request #18011: [SPARK-19089][SQL] Add support for nested sequenc...

2017-05-16 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/18011 [SPARK-19089][SQL] Add support for nested sequences ## What changes were proposed in this pull request? Replaced specific sequence encoders with generic sequence encoder to enable

[GitHub] spark pull request #18009: [SPARK-18891][SQL] Support for specific Java List...

2017-05-16 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/18009 [SPARK-18891][SQL] Support for specific Java List subtypes ## What changes were proposed in this pull request? Add support for specific Java `List` subtypes in deserialization as well

[GitHub] spark issue #16986: [SPARK-18891][SQL] Support for Map collection types

2017-04-16 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16986 Rebased onto the current master and integrated a few minor changes from the code review of #16541 in case anyone is still interested in this feature --- If your project is set up for it, you

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-30 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 @ueshin Thanks for the fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-26 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Thanks. Made the suggested changes in my latest commit. I also encountered a minor problem when doing final testing. When using a collection type that is a type alias (e.g., scala.List

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-03-19 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r106810993 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -589,6 +590,170 @@ case class MapObjects

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-03-19 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r106810940 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SequenceBenchmark.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-03-19 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r106810789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -589,6 +590,170 @@ case class MapObjects

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-15 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 That seems to be the case here, yes. What about the other benefits I mentioned (adding support for Java `List`s and future Scala 2.13 compatibility)? I think the codegen is also more

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-10 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Well, technically yes. But I would say it's a little more than that. The current approach to deserialization of `Seq`s is to copy the data into an array, construct a `WrappedArray

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-04 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Also please note the [UnsafeArrayData-producing branch](https://github.com/michalsenkyr/spark/compare/dataset-seq-builder...michalsenkyr:dataset-seq-builder-unsafe) that is not yet merged

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-04 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Would it be possible for somebody to review this PR for me? I have a few ideas that are dependent on this and I'd like to get to work on them. Most notably support for Java Lists. Maybe

[GitHub] spark issue #16986: [SPARK-18891][SQL] Support for Map collection types

2017-02-26 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16986 Added support for Java Maps with support for pre-allocation (capacity argument on constructor) and sensible defaults for interfaces/abstract classes. Also includes implicit encoders

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Map collection typ...

2017-02-18 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/16986 [SPARK-18891][SQL] Support for Map collection types ## What changes were proposed in this pull request? Add support for arbitrary Scala `Map` types in deserialization as well

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-02-02 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Apologies for taking so long. I tried modifying the serialization logic as best as I could to serialize into `UnsafeArrayData` ([branch diff](https://github.com/michalsenkyr/spark

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-18 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 I added the benchmarks based on the code you provided but I am getting almost the same results before and after the optimization (see description). So either the added benefit is really small

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-15 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Added benchmarks. I didn't find any standardized way of benchmarking codegen so I wrote a simple script for Spark Shell. Benchmarks were run on a laptop so the collections couldn't

[GitHub] spark pull request #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow...

2017-01-12 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16546#discussion_r95927063 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -120,6 +120,32 @@ object ScalaReflection extends

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-01-12 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r95923927 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -589,6 +590,171 @@ case class MapObjects

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-01-12 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r95921320 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -589,6 +590,171 @@ case class MapObjects

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-01-12 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r95909808 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -589,6 +590,171 @@ case class MapObjects

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-11 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Added codegen comparison for a simple `List` dataset. I will also prepare a benchmark and add some results later. Those will be for `List`, `mutable.Queue` and `Seq`. Where `List

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-01-11 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r95667424 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -589,6 +590,171 @@ case class MapObjects

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-10 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16541 Also, the new `CollectObjects` copies quite a bit of code from `MapObjects`. Should I move the code into a common trait in order to reduce duplicity or should I leave it as is? --- If your

[GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria...

2017-01-10 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/16541 [SPARK-19088][SQL] Optimize sequence type deserialization codegen ## What changes were proposed in this pull request? Optimization of arbitrary Scala sequence deserialization

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2017-01-05 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Not sure how to run MiMa tests locally so I tried my best to figure out what was necessary. Hope this fixes it. The downside of the fix is that I had to restore the original methods

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2017-01-03 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16240#discussion_r94504343 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -130,6 +130,30 @@ class DatasetPrimitiveSuite extends

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-24 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16240#discussion_r93816269 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -100,31 +97,36 @@ abstract class SQLImplicits { // Seqs

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-23 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16240#discussion_r93807181 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -312,12 +312,46 @@ object ScalaReflection extends

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-23 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16240#discussion_r93805236 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -100,31 +97,36 @@ abstract class SQLImplicits { // Seqs

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-22 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 I actually read that but IDEA complained when I tried to place the `Product` encoder into a separate trait. So I opted for specificity. However, I tried it again right now and even though

[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-22 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r93682192 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2016-12-21 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16329 If you are having trouble building Javadoc, try switching to Java 7 temporarily. Java 8 introduced stricter Javadoc rules that may fail the docs build. Unfortunately Jenkins doesn't, so new

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-20 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 None of them. The compilation will fail. That is why I had to provide those additional implicits. ``` scala> class Test[T] defined class Test scala> implicit def

[GitHub] spark issue #16157: [SPARK-18723][DOC] Expanded programming guide informatio...

2016-12-15 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16157 Sorry for the delay. You are probably right that the partitioning is primarily determined by data locality and that it is therefore appropriate in some cases and shouldn't be worded

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-11 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Possible optimization: Instead of conversions using `to`, we can use `Builder`s. This way we could get rid of the conversion overhead. This would require adding a new codegen method that would

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-10 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Added support for arbitrary sequences. Now also Queues, ArrayBuffers and such can be used in datasets (all are serialized into ArrayType). I had to alter and add new implicit

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-09 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 I would like to add that the conversion is specific to `List[_]`. I can add support for arbitrary sequence types through the use of `CanBuildFrom` if it is desirable. We can also

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-09 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/16240 [SPARK-16792][SQL] Dataset containing a Case Class with a List type causes a CompileException (converting sequence to list) ## What changes were proposed in this pull request? Added

[GitHub] spark pull request #16157: [SPARK-18723][DOC] Expanded programming guide inf...

2016-12-09 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16157#discussion_r91716176 --- Diff: docs/programming-guide.md --- @@ -347,7 +347,7 @@ Some notes on reading files with Spark: Apart from text files, Spark's Scala API

[GitHub] spark pull request #16201: [SPARK-3359][DOCS] Fix greater-than symbols in Ja...

2016-12-07 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16201#discussion_r91429780 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -81,7 +81,7 @@ class DecisionTreeClassifier

[GitHub] spark issue #16201: [SPARK-3359][DOCS] Fix greater-than symbols in Javadoc t...

2016-12-07 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16201 This time I inspected both the generated Javadoc and Scaladoc. It should be fine now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #16201: [SPARK-3359][DOCS] Fix grater-than symbols in Jav...

2016-12-07 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/16201 [SPARK-3359][DOCS] Fix grater-than symbols in Javadoc to allow building with Java 8 ## What changes were proposed in this pull request? The API documentation build was failing when

[GitHub] spark pull request #16157: [SPARK-18723][DOC] Expanded programming guide inf...

2016-12-07 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16157#discussion_r91405625 --- Diff: docs/programming-guide.md --- @@ -347,7 +347,7 @@ Some notes on reading files with Spark: Apart from text files, Spark's Scala API

[GitHub] spark issue #16157: [SPARK-18723][DOC] Expanded programming guide informatio...

2016-12-07 Thread michalsenkyr
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16157 I added a few more sentences describing the cases in which the user might want to use the argument. However, I am afraid this might be a little too descriptive. --- If your project is set up

[GitHub] spark pull request #16157: [SPARK-18723][DOC] Expanded programming guide inf...

2016-12-05 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16157#discussion_r90975179 --- Diff: docs/programming-guide.md --- @@ -347,7 +347,7 @@ Some notes on reading files with Spark: Apart from text files, Spark's Scala API

[GitHub] spark pull request #16157: [SPARK-18723][DOC] Expanded programming guide inf...

2016-12-05 Thread michalsenkyr
GitHub user michalsenkyr opened a pull request: https://github.com/apache/spark/pull/16157 [SPARK-18723][DOC] Expanded programming guide information on wholeTex… ## What changes were proposed in this pull request? Add additional information to wholeTextFiles