[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20415
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20415
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86898/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20415
  
**[Test build #86898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86898/testReport)**
 for PR 20415 at commit 
[`5d70fd1`](https://github.com/apache/spark/commit/5d70fd1f939a67707f16c1afdefea6d4342c019e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20415
  
@mgaido91 thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20415
  
**[Test build #86898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86898/testReport)**
 for PR 20415 at commit 
[`5d70fd1`](https://github.com/apache/spark/commit/5d70fd1f939a67707f16c1afdefea6d4342c019e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-31 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20415
  
LGTM, only a minor comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20415
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86860/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20415
  
**[Test build #86860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86860/testReport)**
 for PR 20415 at commit 
[`e3e09d9`](https://github.com/apache/spark/commit/e3e09d98072bd39328a4e7d4de1ddd38594c6232).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20415
  
**[Test build #86860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86860/testReport)**
 for PR 20415 at commit 
[`e3e09d9`](https://github.com/apache/spark/commit/e3e09d98072bd39328a4e7d4de1ddd38594c6232).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20415
  
looks like a reasonable change to me. Although I don't think this will have 
some significant performance improvement, it makes the code more compact.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20415
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-30 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20415
  
@cloud-fan Can you help me to review it. thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-29 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/20415
  
@hvanhovell ,thank you for review it.
I tested the code for this PR change, 

**in FileSourceScanExec->doExecute code:**

```
  if (needsUnsafeRowConversion) {
scan.mapPartitionsWithIndexInternal { (index, iter) =>
  val proj = UnsafeProjection.create(schema)
  proj.initialize(index)
  iter.map( r => {
numOutputRows += 1
proj(r)
  })
}
  } else {
val scanOther = scan.mapPartitionsWithIndexInternal { (index, iter) 
=>
  val proj = UnsafeProjection.create(schema)
  proj.initialize(index)
  iter.map(proj)
}

scanOther.map { r =>
  numOutputRows += 1
  r
}
  }

```

**Start spark-shell:**

> ./spark-shell --executor-memory 15G  --total-executor-cores 1 --conf 
spark.executor.cores=1


**test code:**

> val df4 = (0 until 50).map(i => (i % 2, i % 3, i % 4, i % 5, i % 6, i 
% 7, i % 8)).toDF("i2","i3","i4","i5","i6","i7","i8")
> df4.write.format("parquet").partitionBy("i2", "i3", "i4").bucketBy(8, 
"i5","i6","i7","i8").saveAsTable("table50")
> 
> def runBenchmark(name: String, cardinality: Int)(f: => Unit): Unit = {
>   val startTime = System.nanoTime
>   (0 to cardinality).foreach(i => f)
>   val endTime = System.nanoTime
>   println(s"Time taken in $name: " + (endTime - startTime).toDouble / 
10 + " seconds")
> }
> 
> def benchmark(name: String, card: Int)(f: => Unit){
>   (0 to card).foreach(i => f)
> }
> 
> After modified File SourceScan Exec: 
> benchmark("File SourceScan Exec", 2){
> runBenchmark("After modified File SourceScan Exec ", 200) {
> spark.conf.set("spark.sql.codegen.maxFields", 2)
> spark.conf.set("spark.sql.parquet.enableVectorizedReader", true)
> spark.sql("select * from table50").count()
> }
> }
> 
> Before modified File SourceScan Exec:
> benchmark("File SourceScan Exec", 2){
> runBenchmark("Before modified File SourceScan Exec ", 200) {
> spark.conf.set("spark.sql.codegen.maxFields", 2)
> spark.conf.set("spark.sql.parquet.enableVectorizedReader", false)
> spark.sql("select * from table50").count()
> }
> }
> 

**test result:**

> 
> Test 20 times:
> *Test times:first times(s)second times(s)   Third  
times(s)   avg(s)
> 
*-
> *Before modified10.97 10.83 11.05 
10.95
> *After modified   9.33  9.61   9.32   
9.42
> 
> 
> Test 100 times:
> *Test times:first times(s)second times(s)   Third  
times(s)   avg(s)
> 
*-
> *Before modified51.74 52.80 71.88 
58.80 
> *After modified   47.24 46.18 48.92   
  47.45
> 
> 
> Test 200 times:
> *Test times:first times(s)second times(s)   Third  
times(s)   avg(s)
> 
*-
> *Before modified236.85325.97395.69
319.50
> *After modified  208.90 244.13261.18  
  238.07 
> 
> 

thanks.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-27 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20415
  
@heary-cao have you benchmarked this? The reason I am asking is because 
Spark SQL chains iterators, these are pipelined and only materialized when we 
need to. Your PR effectively removes two virtual calls (hasNext/next) per 
tuple, so I don't see too much benefit here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20415
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...

2018-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20415
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org