[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20435


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20435#discussion_r165048049
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 ---
@@ -403,7 +403,7 @@ class MicroBatchExecution(
   val current = committedOffsets.get(reader).map(off => 
reader.deserializeOffset(off.json))
   reader.setOffsetRange(
 toJava(current),
-Optional.of(available.asInstanceOf[OffsetV2]))
+Optional.of(available.asInstanceOf[streaming.Offset]))
--- End diff --

shall we still use `OffsetV2`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20435#discussion_r165047744
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
 ---
@@ -20,14 +20,15 @@ package org.apache.spark.sql.kafka010
 import org.apache.kafka.common.TopicPartition
 
 import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset}
-import org.apache.spark.sql.sources.v2.streaming.reader.{Offset => 
OffsetV2, PartitionOffset}
+import org.apache.spark.sql.sources.v2.reader.streaming.{Offset => 
OffsetV2, PartitionOffset}
 
 /**
  * An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions 
of subscribed topics and
  * their offsets.
  */
 private[kafka010]
-case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, 
Long]) extends OffsetV2 {
+case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long])
+  extends OffsetV2 {
--- End diff --

unnecessary change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20435#discussion_r164953225
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
 ---
@@ -20,14 +20,16 @@ package org.apache.spark.sql.kafka010
 import org.apache.kafka.common.TopicPartition
 
 import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset}
-import org.apache.spark.sql.sources.v2.streaming.reader.{Offset => 
OffsetV2, PartitionOffset}
+import org.apache.spark.sql.sources.v2.reader.streaming
+import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset
 
 /**
  * An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions 
of subscribed topics and
--- End diff --

Should this `Offset` be `streaming.Offset`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20435#discussion_r164953380
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala
 ---
@@ -30,9 +30,8 @@ import 
org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
 import 
org.apache.spark.sql.catalyst.streaming.InternalOutputModes.{Append, Complete, 
Update}
 import org.apache.spark.sql.execution.streaming.Sink
 import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2}
-import org.apache.spark.sql.sources.v2.streaming.StreamWriteSupport
-import org.apache.spark.sql.sources.v2.streaming.writer.StreamWriter
-import org.apache.spark.sql.sources.v2.writer._
+import org.apache.spark.sql.sources.v2.writer.{StreamWriteSupport, _}
--- End diff --

`import org.apache.spark.sql.sources.v2.writer._`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20435#discussion_r164943009
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
 ---
@@ -20,14 +20,16 @@ package org.apache.spark.sql.kafka010
 import org.apache.kafka.common.TopicPartition
 
 import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset}
-import org.apache.spark.sql.sources.v2.streaming.reader.{Offset => 
OffsetV2, PartitionOffset}
+import org.apache.spark.sql.sources.v2.reader.streaming
+import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset
--- End diff --

can we keep it same as before?
```
import org.apache.spark.sql.sources.v2.reader.streaming.{Offset => 
OffsetV2, PartitionOffset}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-30 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/20435

[SPARK-23268][SQL]Reorganize packages in data source V2

## What changes were proposed in this pull request?
1. create a new package for partitioning/distribution related classes.
As Spark will add new concrete implementations of `Distribution` in new 
releases, it is good to 
have a new package for partitioning/distribution related classes.

2. move streaming related class to package 
`org.apache.spark.sql.sources.v2.reader/writer.streaming`, instead of 
`org.apache.spark.sql.sources.v2.streaming.reader/writer`.
So that the there won't be package reader/writer inside package streaming, 
which is quite confusing.
## How was this patch tested?
Unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark new_pkg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20435.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20435


commit da29b863970b8ca5039d547ae5240e5c34f9f11a
Author: Wang Gengliang 
Date:   2018-01-30T08:31:10Z

create a new package for partitioning/distribution related classes

commit 3dc56226ed17637c808df9d8d03f60fcbbae9e55
Author: Wang Gengliang 
Date:   2018-01-30T09:16:01Z

re-org streaming




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org