subject:"\[GitHub\] spark issue #16541\: \[SPARK\-19088\]\[SQL\] Optimize sequence type deserializatio..."

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-30 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
@ueshin Thanks for the fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-29 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16541
  
I sent a pr #17473.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/16541
  
This PR unfortunately broke Scala 2.10 compilation

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/4110/console


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16541
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16541
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75263/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16541
  
**[Test build #75263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75263/testReport)**
 for PR 16541 at commit 
[`d04e043`](https://github.com/apache/spark/commit/d04e043fcd00204531553cb0a8ac1148d85436f4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16541
  
**[Test build #75263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75263/testReport)**
 for PR 16541 at commit 
[`d04e043`](https://github.com/apache/spark/commit/d04e043fcd00204531553cb0a8ac1148d85436f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16541
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16541
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16541
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75255/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16541
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16541
  
**[Test build #75255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75255/testReport)**
 for PR 16541 at commit 
[`d04e043`](https://github.com/apache/spark/commit/d04e043fcd00204531553cb0a8ac1148d85436f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16541
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-26 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Thanks. Made the suggested changes in my latest commit.

I also encountered a minor problem when doing final testing. When using a 
collection type that is a type alias (e.g., scala.List), the companion object's 
`newBuilder` could not be found. Fixed it by dealiasing the collection type 
before obtaining the companion object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16541
  
LGTM except 2 minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-15 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
That seems to be the case here, yes.

What about the other benefits I mentioned (adding support for Java `List`s 
and future Scala 2.13 compatibility)? I think the codegen is also more 
straightforward/clear (and much shorter).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-10 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16541
  
I didn't look into the details here, but very often scanning data twice 
doesn't necessarily slow things down, especially in the case of sequential 
scan. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-10 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Well, technically yes. But I would say it's a little more than that.

The current approach to deserialization of `Seq`s is to copy the data into 
an array, construct a `WrappedArray` (which extends `Seq`) and optionally copy 
the data (again) into the new collection. This needs to go through all the 
elements twice if anything other than a `Seq` (or `WrappedArray` directly) is 
requested.
This PR takes a more straightforward approach and constructs a mutable 
collection builder (which is defined for every Scala collection), adds the 
elements to it and retrieves the result.

I assumed this would be a performance improvement and am quite surprised 
that there is no difference. But I think this might be due to the improvement 
being so small as to drown in the usual operation overhead. Sadly, I do not 
have the resources to measure the operation on larger amounts of data, having 
the benchmarks fail on larger collections on my setup.

I am also not that familiar with Spark's internals to determine whether 
@kiszk's suggestion would improve operations in other ways, eg. during 
operations in the cluster environment. That is why I decided to implement it 
but keep it separate from my proposed changes for the time being.

As to other benefits that this PR would bring other than peformance 
improvements:

* I would like to implement Java `List`s support in a manner similar to 
what I did with `Map`s in #16986 (I would also ask you to take a look at that 
when you have the time) by just slightly altering the code.
* I didn't know this when I implemented this but Scala 2.13 will introduce 
a major [collections 
rework](https://www.scala-lang.org/blog/2017/02/28/collections-rework.html) 
that will change the way the `to` method (used for conversions in the current 
solution) works. This will require a rewrite whereas I believe the method used 
here will remain largely the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-08 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16541
  
is it a performance improvement? there is no difference in the benchmark 
results


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-04 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Also please note the [UnsafeArrayData-producing 
branch](https://github.com/michalsenkyr/spark/compare/dataset-seq-builder...michalsenkyr:dataset-seq-builder-unsafe)
 that is not yet merged into this branch. I'd like to get somebody's opinion on 
that before I do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-04 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Would it be possible for somebody to review this PR for me? I have a few 
ideas that are dependent on this and I'd like to get to work on them. Most 
notably support for Java Lists.
Maybe @cloud-fan could take a look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-02-02 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Apologies for taking so long.

I tried modifying the serialization logic as best as I could to serialize 
into `UnsafeArrayData` ([branch 
diff](https://github.com/michalsenkyr/spark/compare/dataset-seq-builder...michalsenkyr:dataset-seq-builder-unsafe)).
 I had to first convert into an array to use `fromPrimitiveArray` on the 
result. That's probably the reason why the benchmark came up slightly worse:

```
OpenJDK 64-Bit Server VM 1.8.0_121-b13 on Linux 4.9.6-1-ARCH
AMD A10-4600M APU with Radeon(tm) HD Graphics
collect: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


Seq256 /  287  0,0  
255670,1   1,0X
List   161 /  220  0,0  
161091,7   1,6X
mutable.Queue  304 /  324  0,0  
303823,3   0,8X
```

I am not entirely sure how `GenericArrayData` and `UnsafeArrayData` is 
handled on transformations and shuffles though, so it's possible that more 
complex tests will reveal better performance. However, I'm not sure that I can 
test this properly on my single-machine setup. I'd definitely be interested in 
benchmark results on a cluster setup.

Generated code:

```
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private scala.collection.Iterator inputadapter_input;
/* 009 */   private boolean CollectObjects_loopIsNull1;
/* 010 */   private int CollectObjects_loopValue0;
/* 011 */   private UnsafeRow deserializetoobject_result;
/* 012 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
deserializetoobject_holder;
/* 013 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
deserializetoobject_rowWriter;
/* 014 */   private scala.collection.immutable.List mapelements_argValue;
/* 015 */   private UnsafeRow mapelements_result;
/* 016 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
mapelements_holder;
/* 017 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
mapelements_rowWriter;
/* 018 */   private scala.collection.immutable.List 
serializefromobject_argValue;
/* 019 */   private UnsafeRow serializefromobject_result;
/* 020 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
serializefromobject_holder;
/* 021 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
serializefromobject_rowWriter;
/* 022 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
serializefromobject_arrayWriter;
/* 023 */
/* 024 */   public GeneratedIterator(Object[] references) {
/* 025 */ this.references = references;
/* 026 */   }
/* 027 */
/* 028 */   public void init(int index, scala.collection.Iterator[] inputs) 
{
/* 029 */ partitionIndex = index;
/* 030 */ this.inputs = inputs;
/* 031 */ inputadapter_input = inputs[0];
/* 032 */
/* 033 */ deserializetoobject_result = new UnsafeRow(1);
/* 034 */ this.deserializetoobject_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(deserializetoobject_result,
 32);
/* 035 */ this.deserializetoobject_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(deserializetoobject_holder,
 1);
/* 036 */
/* 037 */ mapelements_result = new UnsafeRow(1);
/* 038 */ this.mapelements_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(mapelements_result,
 32);
/* 039 */ this.mapelements_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(mapelements_holder,
 1);
/* 040 */
/* 041 */ serializefromobject_result = new UnsafeRow(1);
/* 042 */ this.serializefromobject_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(serializefromobject_result,
 32);
/* 043 */ this.serializefromobject_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(serializefromobject_holder,
 1);
/* 044 */ this.serializefromobject_arrayWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
/* 045 */
/* 046 */   }
/* 047 */
/* 048 */   protected void process

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-18 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
I added the benchmarks based on the code you provided but I am getting 
almost the same results before and after the optimization (see description). So 
either the added benefit is really small or I didn't write/tune the benchmarks 
quite right. I would appreciate if you could take a look at them.
I will also take a look at the `UnsafeArrayData` optimization later and try 
to include it in my next commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/16541
  
Can we get additional performance improvement if we could generate 
`UnsafeArrayData` instead of `GenericArrayData` for this statement ```/* 104 */ 
  final ArrayData serializefromobject_value = false ? null : new 
org.apache.spark.sql.catalyst.util.GenericArrayData(serializefromobject_argValue);```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/16541
  
It looks like the similar optimization to 
https://github.com/apache/spark/pull/15044. 
Does [this 
code](https://github.com/apache/spark/pull/15044/files#diff-d6f03c9d3e82f3774d1110559b039a6d)
 help you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-15 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Added benchmarks.

I didn't find any standardized way of benchmarking codegen so I wrote a 
simple script for Spark Shell. Benchmarks were run on a laptop so the 
collections couldn't be too large. Nevertheless, the benchmarks are 
consistently (even if not significantly) faster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-11 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Added codegen comparison for a simple `List` dataset.
I will also prepare a benchmark and add some results later. Those will be 
for `List`, `mutable.Queue` and `Seq`. Where `List` and `mutable.Queue` should 
benefit from the change (one less pass) and `Seq` should be approximately same 
(as it is a supertype of `WrappedArray` and therefore skips the final 
conversion in the original approach).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-11 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16541
  
Is this a perf optimization? If yes, can you show some benchmarks? Also for 
codegen it's good to show the generated code before/after this change. You can 
get that with

```
df.queryExecution.debug.codegen()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-10 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Also, the new `CollectObjects` copies quite a bit of code from 
`MapObjects`. Should I move the code into a common trait in order to reduce 
duplicity or should I leave it as is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16541
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

31 matches

Site Navigation

Mail list logo

Footer information