from:"Qihong Chen"

[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-09-18 Thread Qihong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932616#comment-16932616
 ] 

Qihong Chen commented on KAFKA-7500:


Hi [~ryannedolan], Thanks for your quick response (y)(y)(y)

Now I know what "primary" means. According to blog [A look inside Kafka 
MirrorMaker2|https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/] by 
Renu Tewari,
{quote}In MM2 only one connect cluster is needed for all the cross-cluster 
replications between a pair of datacenters. Now if we simply take a Kafka 
source and sink connector and deploy them in tandem to do replication, the data 
would need to hop through an intermediate Kafka cluster. MM2 avoids this 
unnecessary data copying by a direct passthrough from source to sink.  
{quote}
 This is exactly what I want! Do you have release schedule for *SinkConnector*, 
and *direct passthrough from source to sink* feature?

 

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.4.0
>Reporter: Ryanne Dolan
>Assignee: Manikumar
>Priority: Major
>  Labels: pull-request-available, ready-to-commit
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-09-17 Thread Qihong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931743#comment-16931743
 ] 

Qihong Chen edited comment on KAFKA-7500 at 9/17/19 7:22 PM:
-

Hi [~ryannedolan], just listened your Kafka Power Chat talk, Thanks!

Have a follow up question about the last question (from me) you answered in the 
talk. You said you prefer dedicated MM2 cluster over running MM2 in connect 
cluster since you can use less number of clusters to do replications among 
multiple Kafka clusters.

But there's no REST Api for a dedicated MM2 cluster that can provide the status 
of the replication streams, nor updating the replication configuration. Any 
changes to the configuration meaning update the config files and restart all 
MM2 instances, is that right? Or I missed it that dedicated MM2 cluster does 
provide REST API for admin and monitoring, if so, where is it?

If my understanding is correct, we can archive the same thing with MM2 in 
connect cluster. Assume there are 3 Kafka clusters: A, B, and C. Set up a 
connect cluster against C (meaning all topics for connectors' data and states 
go to cluster C), then set up MM2 connectors to replicate data and metadata  A 
-> B and B -> A. If this is correct, we can use the Kafka cluster C plus the 
connect cluster that running against Kafka cluster C to replicate data among 
more Kafka clusters, like A, B, and D, and even more. Of course, this needs 
more complicated configuration, which requires deeper understanding how the MM2 
connectors work. In this scenario, the connect cluster provides REST API to 
admin and monitoring all the connectors. This will be useful for people can't 
use Stream Replication Manager from Cloudera or Kafka replicator from Confluent 
for some reason. Is this right?


was (Author: qihong):
Hi [~ryannedolan], just listened your Kafka Power Chat talk, Thanks!

Have a follow up question about the last question (from me) you answered in the 
talk. You said you prefer dedicated MM2 cluster over running MM2 in connect 
cluster since you can use less number of clusters to do replications among 
multiple Kafka clusters.

But there's no REST Api for a dedicated MM2 cluster that can provide the status 
of the replication streams, nor updating the replication configuration. Any 
changes to the configuration meaning update the config files and restart all 
MM2 instances, is that right? Or I missed it that dedicated MM2 cluster does 
provide REST API for admin and monitoring, if so, where is it?

If my understanding is correct, we can archive the same thing with MM2 in 
connect cluster. Assume there are 3 Kafka clusters: A, B, and C. Set up a 
connect cluster against C (meaning all topics for connectors' data and states 
go to cluster C), then set up MM2 connectors to replicate data and metadata  A 
-> B and B -> A. If this is correct, we can use the Kafka cluster C plus the 
connect cluster that running against Kafka cluster C to replicate data among 
more Kafka clusters, like A, B, and D, and even more. Of course, this needs 
more complicated configuration, which requires deeper understanding how the MM2 
connectors work. In this scenario, the connect cluster provides REST API to 
admin and monitoring all the connectors. This will be useful for people can't 
use Stream Replication Manager from Cloudera or Kafka replicator from Confluent 
for some reason. Is this right?

 

 

 

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.4.0
>Reporter: Ryanne Dolan
>Assignee: Manikumar
>Priority: Major
>  Labels: pull-request-available, ready-to-commit
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-09-17 Thread Qihong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931743#comment-16931743
 ] 

Qihong Chen commented on KAFKA-7500:


Hi [~ryannedolan], just listened your Kafka Power Chat talk, Thanks!

Have a follow up question about the last question (from me) you answered in the 
talk. You said you prefer dedicated MM2 cluster over running MM2 in connect 
cluster since you can use less number of clusters to do replications among 
multiple Kafka clusters.

But there's no REST Api for a dedicated MM2 cluster that can provide the status 
of the replication streams, nor updating the replication configuration. Any 
changes to the configuration meaning update the config files and restart all 
MM2 instances, is that right? Or I missed it that dedicated MM2 cluster does 
provide REST API for admin and monitoring, if so, where is it?

If my understanding is correct, we can archive the same thing with MM2 in 
connect cluster. Assume there are 3 Kafka clusters: A, B, and C. Set up a 
connect cluster against C (meaning all topics for connectors' data and states 
go to cluster C), then set up MM2 connectors to replicate data and metadata  A 
-> B and B -> A. If this is correct, we can use the Kafka cluster C plus the 
connect cluster that running against Kafka cluster C to replicate data among 
more Kafka clusters, like A, B, and D, and even more. Of course, this needs 
more complicated configuration, which requires deeper understanding how the MM2 
connectors work. In this scenario, the connect cluster provides REST API to 
admin and monitoring all the connectors. This will be useful for people can't 
use Stream Replication Manager from Cloudera or Kafka replicator from Confluent 
for some reason. Is this right?

 

 

 

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.4.0
>Reporter: Ryanne Dolan
>Assignee: Manikumar
>Priority: Major
>  Labels: pull-request-available, ready-to-commit
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-09-11 Thread Qihong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927863#comment-16927863
 ] 

Qihong Chen edited comment on KAFKA-7500 at 9/11/19 6:13 PM:
-

[~ryannedolan] Thanks for all your efforts on MM2. Appreciate your input on the 
following questions.

I'm trying to replicate topics from one cluster to another, include topic data 
and related consumers' offset. But I only see the topic data was replicated, 
not consumer offsets.

Here's my mm2.properties
{code:java}
clusters = dr1, dr2

dr1.bootstrap.servers = 10.1.0.4:9092,10.1.0.5:9092,10.1.0.6:9092
dr2.bootstrap.servers = 10.2.0.4:9092,10.2.0.5:9092,10.2.0.6:9092

# only allow replication dr1 -> dr2
dr1->dr2.enabled = true
dr1->dr2.topics = test.*
dr1->dr2.groups = test.*
dr1->dr2.emit.heartbeats.enabled = false

dr2->dr1.enabled = false
dr2->dr1.emit.heartbeats.enabled = false
{code}
Here's how I started MM2 cluster (dr2 as the nearby cluster)
{code:java}
nohup bin/connect-mirror-maker.sh mm2.properties --clusters dr2 > mm2.log 2>&1 &
{code}
On dr1, there's topic *test1*, and consumer group *test1grp* for topic test1.

On dr2, I found following topics
{code:java}
__consumer_offsets
dr1.checkpoints.internal
dr1.test1
heartbeats
mm2-configs.dr1.internal
mm2-offsets.dr1.internal
mm2-status.dr1.internal
 {code}
But couldn't find any consumer groups on dr2 related to consumer group 
*test1grp*.

Could you please let me know in detail how to migrate consumer group *test1grp* 
from dr1 to dr2?, i.e. what command(s) need to run to set up the offset for 
*test1grp* on dr2 before consume topic *dr1.test1* ?

 

By the way, how to set up and run this in a Kafka connect cluster? i.e., how to 
set up 
 MirrorSourceConnector, MirrorCheckpointConnector in a connect cluster? Is 
there document about this?
  


was (Author: qihong):
[~ryannedolan] Thanks for all your efforts on MM2. Appreciate your input on the 
following questions.

I'm trying to replicate topics from one cluster to another, include topic data 
and related consumers' offset. But I only see the topic data was replicated, 
not consumer offsets.

Here's my mm2.properties
{code:java}
clusters = dr1, dr2

dr1.bootstrap.servers = 10.1.0.4:9092,10.1.0.5:9092,10.1.0.6:9092
dr2.bootstrap.servers = 10.2.0.4:9092,10.2.0.5:9092,10.2.0.6:9092

# only allow replication dr1 -> dr2
dr1->dr2.enabled = true
dr1->dr2.topics = test.*
dr1->dr2.groups = test.*
dr1->dr2.emit.heartbeats.enabled = false

dr2->dr1.enabled = false
dr2->dr1.emit.heartbeats.enabled = false
{code}
Here's how I started MM2 cluster (dr2 as the nearby cluster)
{code:java}
nohup bin/connect-mirror-maker.sh mm2.properties --clusters dr2 > mm2.log 2>&1 &
{code}
On dr1, there's topic *test1*, and consumer group *test1grp* for topic test1.

On dr2, I found following topics
{code:java}
__consumer_offsets
dr1.checkpoints.internal
dr1.test1
heartbeats
mm2-configs.dr1.internal
mm2-offsets.dr1.internal
mm2-status.dr1.internal
 {code}
But couldn't find any consumer groups on dr2 related consumer group *test1grp*.


 Could you please let me know in detail how to migrate consumer group 
*test1grp* from dr1 to dr2?, i.e. what command(s) need to run to set up the 
offset for *test1grp* on dr2 before consume topic *dr1.test1* ?

 

By the way, how to set up and run this in a Kafka connect cluster? i.e., how to 
set up 
 MirrorSourceConnector, MirrorCheckpointConnector in a connect cluster? Is 
there document about this?
  

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.4.0
>Reporter: Ryanne Dolan
>Assignee: Manikumar
>Priority: Major
>  Labels: pull-request-available, ready-to-commit
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-09-11 Thread Qihong Chen (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927863#comment-16927863
 ] 

Qihong Chen commented on KAFKA-7500:


[~ryannedolan] Thanks for all your efforts on MM2. Appreciate your input on the 
following questions.

I'm trying to replicate topics from one cluster to another, include topic data 
and related consumers' offset. But I only see the topic data was replicated, 
not consumer offsets.

Here's my mm2.properties
{code:java}
clusters = dr1, dr2

dr1.bootstrap.servers = 10.1.0.4:9092,10.1.0.5:9092,10.1.0.6:9092
dr2.bootstrap.servers = 10.2.0.4:9092,10.2.0.5:9092,10.2.0.6:9092

# only allow replication dr1 -> dr2
dr1->dr2.enabled = true
dr1->dr2.topics = test.*
dr1->dr2.groups = test.*
dr1->dr2.emit.heartbeats.enabled = false

dr2->dr1.enabled = false
dr2->dr1.emit.heartbeats.enabled = false
{code}
Here's how I started MM2 cluster (dr2 as the nearby cluster)
{code:java}
nohup bin/connect-mirror-maker.sh mm2.properties --clusters dr2 > mm2.log 2>&1 &
{code}
On dr1, there's topic *test1*, and consumer group *test1grp* for topic test1.

On dr2, I found following topics
{code:java}
__consumer_offsets
dr1.checkpoints.internal
dr1.test1
heartbeats
mm2-configs.dr1.internal
mm2-offsets.dr1.internal
mm2-status.dr1.internal
 {code}
But couldn't find any consumer groups on dr2 related consumer group *test1grp*.


 Could you please let me know in detail how to migrate consumer group 
*test1grp* from dr1 to dr2?, i.e. what command(s) need to run to set up the 
offset for *test1grp* on dr2 before consume topic *dr1.test1* ?

 

By the way, how to set up and run this in a Kafka connect cluster? i.e., how to 
set up 
 MirrorSourceConnector, MirrorCheckpointConnector in a connect cluster? Is 
there document about this?
  

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.4.0
>Reporter: Ryanne Dolan
>Assignee: Manikumar
>Priority: Major
>  Labels: pull-request-available, ready-to-commit
> Fix For: 2.4.0
>
> Attachments: Active-Active XDCR setup.png
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: Merging GEODE-9 Spark Connector project into develop

2015-07-24 Thread Qihong Chen

+1 to merge

On Fri, Jul 24, 2015 at 9:54 AM, Dan Smith  wrote:

> +1 to merge.
>
> -Dan
>
> On Thu, Jul 23, 2015 at 11:44 PM, Jianxia Chen  wrote:
>
> > +1 for merge
> >
> > On Thu, Jul 23, 2015 at 4:27 PM, Anilkumar Gingade 
> > wrote:
> >
> > > +1 for merge.
> > >
> > > On Thu, Jul 23, 2015 at 10:19 AM, Jason Huynh 
> wrote:
> > >
> > > > Greetings,
> > > >
> > > > We are hoping to merge in GEODE-9 to develop.  GEODE-9 is the feature
> > > work
> > > > for the gemfire/geode- spark connector.  This work had previously
> been
> > > done
> > > > on a private repo prior to Geode being in incubation and is quite
> > large.
> > > >
> > > > This merge will create a sub project in Geode named
> > > > gemfire-spark-connector.  This project uses sbt to do it's build and
> > has
> > > > not yet been connected to the Geode build system.  There will be
> future
> > > > work to better incorporate it with the gradle as well as removing the
> > > geode
> > > > jar dependency.
> > > >
> > > > This project has a separate set of readme/tutorial docs as well as
> it's
> > > own
> > > > tests and integration tests.  These also have not been integrated
> with
> > > the
> > > > automated testing and will need to be included at some point.
> > > >
> > > > The hope was to get this merged in and do the remaining work in
> > smaller,
> > > > easier to digest chunks as well as possibly getting other
> contributors
> > > > helping with these efforts.
> > > >
> > > > Currently there is a review for this entire change at:
> > > > https://reviews.apache.org/r/36731/
> > > > It will probably be easier to just get a checkout of the branch to
> see
> > > what
> > > > it looks like.
> > > >
> > > > Please voice any concerns, suggestions or questions on this thread.
> > > >
> > > > Thanks!
> > > >
> > >
> >
>

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

2015-07-23 Thread Qihong Chen

forgot to ask, do we need .gitignore file? there's only one line of "/bin/"
in it.

Thanks!

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

2015-07-23 Thread Qihong Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36731/#review92775
---

Ship it!


Ship It!

- Qihong Chen


On July 23, 2015, 4:29 p.m., Jason Huynh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36731/
> ---
> 
> (Updated July 23, 2015, 4:29 p.m.)
> 
> 
> Review request for geode, anilkumar gingade, Bruce Schuchardt, Jianxia Chen, 
> Lynn Gallinat, and Qihong Chen.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> The diff is of all the files in the GEODE-9 feature branch.  It will probably 
> be easier just to get a checkout of the feature branch and peruse the 
> gemfire-spark-connector directory.
> We hope to merge this in and make future changes to fully integrate the 
> connector into the gradle build at a future time.
> 
> 
> Diffs
> -
> 
>   gemfire-spark-connector/.gitignore PRE-CREATION 
>   gemfire-spark-connector/README.md PRE-CREATION 
>   gemfire-spark-connector/doc/10_demos.md PRE-CREATION 
>   gemfire-spark-connector/doc/1_building.md PRE-CREATION 
>   gemfire-spark-connector/doc/2_quick.md PRE-CREATION 
>   gemfire-spark-connector/doc/3_connecting.md PRE-CREATION 
>   gemfire-spark-connector/doc/4_loading.md PRE-CREATION 
>   gemfire-spark-connector/doc/5_rdd_join.md PRE-CREATION 
>   gemfire-spark-connector/doc/6_save_rdd.md PRE-CREATION 
>   gemfire-spark-connector/doc/7_save_dstream.md PRE-CREATION 
>   gemfire-spark-connector/doc/8_oql.md PRE-CREATION 
>   gemfire-spark-connector/doc/9_java_api.md PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/RegionMetadata.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/QueryFunction.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/RetrieveRegionFunction.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/RetrieveRegionMetadataFunction.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/StructStreamingResultSender.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/Employee.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/JavaApiIntegrationTest.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/Portfolio.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/Position.java
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/resources/test-regions.xml
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/resources/test-retrieve-regions.xml
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/BasicIntegrationTest.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/RDDJoinRegionIntegrationTest.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/RetrieveRegionIntegrationTest.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/package.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/testkit/GemFireCluster.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/testkit/GemFireRunner.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/testkit/IOUtils.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/src/it/scala/org/apache/spark/streaming/ManualClockHelper.scala
>  PRE-CREATION 
>   
> gemfire-spark-connector/gemfire-spark-connector/

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

2015-07-23 Thread Qihong Chen

I'm not talking about changes to GEODE-9 feature branch. Just want to know
if this merge is just "copy-ing" all files from GEODE-9 feature branch to
develop branch?

Thanks!

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

2015-07-23 Thread Qihong Chen

Hi Jason,

Is there any "real" changes from GEODE-9 feature branch except adding all
files to develop branch?

Thanks,
Qihong

[jira] [Created] (GEODE-137) Spark Connector: should connect to local GemFire server if possible

2015-07-17 Thread Qihong Chen (JIRA)

Qihong Chen created GEODE-137:
-

 Summary: Spark Connector: should connect to local GemFire server 
if possible
 Key: GEODE-137
 URL: https://issues.apache.org/jira/browse/GEODE-137
 Project: Geode
  Issue Type: Bug
Reporter: Qihong Chen
Assignee: Qihong Chen


DefaultGemFireConnection uses ClientCacheFactory with locator info to create 
ClientCache instance. In this case, the ClientCache doesn't connect to the 
GemFire/Geode server on the same host if there's one. This cause more network 
traffic and less efficient.

ClientCacheFactory can create ClientCache based on GemFire server(s) info as 
well. Therefore, we can force the ClientCache connects to local GemFire server 
if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M entries per partition)

2015-07-17 Thread Qihong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Chen resolved GEODE-120.
---
Resolution: Fixed

> RDD.saveToGemfire() can not handle big dataset (1M entries per partition)
> -
>
> Key: GEODE-120
> URL: https://issues.apache.org/jira/browse/GEODE-120
> Project: Geode
>  Issue Type: Sub-task
>  Components: core, extensions
>Affects Versions: 1.0.0-incubating
>    Reporter: Qihong Chen
>Assignee: Qihong Chen
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the connector use single region.putAll() call to save each RDD partition. But 
> putAll() doesn't  handle big dataset well (such as 1M record). Need to split 
> the dataset into smaller chunks, and invoke putAll() for each chunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

2015-07-17 Thread Qihong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Chen closed GEODE-114.
-

> There's race condition in DefaultGemFireConnection.getRegionProxy
> -
>
> Key: GEODE-114
> URL: https://issues.apache.org/jira/browse/GEODE-114
> Project: Geode
>  Issue Type: Sub-task
>  Components: core, extensions
>Affects Versions: 1.0.0-incubating
>Reporter: Qihong Chen
>Assignee: Qihong Chen
> Fix For: 1.0.0-incubating
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> when multiple threads try to call getRegionProxy with the same region at the 
> same time, the following exception was thrown:
> com.gemstone.gemfire.cache.RegionExistsException: /debs
> at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880)
> at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835)
> at 
> com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223)
> at 
> io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87)
> at 
> io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47)
> at 
> io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
> at 
> io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Subscription request

2015-07-17 Thread Qihong Chen

send an empty email to user*-subscribe*@geode.incubator.apache.org

[jira] [Resolved] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

2015-07-16 Thread Qihong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Chen resolved GEODE-114.
---
   Resolution: Fixed
Fix Version/s: 1.0.0-incubating

> There's race condition in DefaultGemFireConnection.getRegionProxy
> -
>
> Key: GEODE-114
> URL: https://issues.apache.org/jira/browse/GEODE-114
> Project: Geode
>  Issue Type: Sub-task
>  Components: core, extensions
>Affects Versions: 1.0.0-incubating
>Reporter: Qihong Chen
>Assignee: Qihong Chen
> Fix For: 1.0.0-incubating
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> when multiple threads try to call getRegionProxy with the same region at the 
> same time, the following exception was thrown:
> com.gemstone.gemfire.cache.RegionExistsException: /debs
> at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880)
> at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835)
> at 
> com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223)
> at 
> io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87)
> at 
> io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47)
> at 
> io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
> at 
> io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M entries per partition)

2015-07-16 Thread Qihong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Chen updated GEODE-120:
--
Summary: RDD.saveToGemfire() can not handle big dataset (1M entries per 
partition)  (was: RDD.saveToGemfire() can not handle big dataset (1M record per 
partition))

> RDD.saveToGemfire() can not handle big dataset (1M entries per partition)
> -
>
> Key: GEODE-120
> URL: https://issues.apache.org/jira/browse/GEODE-120
> Project: Geode
>  Issue Type: Sub-task
>  Components: core, extensions
>Affects Versions: 1.0.0-incubating
>    Reporter: Qihong Chen
>Assignee: Qihong Chen
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the connector use single region.putAll() call to save each RDD partition. But 
> putAll() doesn't  handle big dataset well (such as 1M record). Need to split 
> the dataset into smaller chunks, and invoke putAll() for each chunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Unable to start locator locally

2015-07-16 Thread Qihong Chen

Make sure JAVA_HOME is set for the terminal you used for gfsh. I got the
similar error the other day.

Remember to kill all Geode processes before try again.

Qihong

[jira] [Commented] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M record per partition)

2015-07-15 Thread Qihong Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628902#comment-14628902
 ] 

Qihong Chen commented on GEODE-120:
---

post code review request: https://reviews.apache.org/r/36530/

> RDD.saveToGemfire() can not handle big dataset (1M record per partition)
> 
>
> Key: GEODE-120
> URL: https://issues.apache.org/jira/browse/GEODE-120
> Project: Geode
>  Issue Type: Sub-task
>  Components: core, extensions
>Affects Versions: 1.0.0-incubating
>Reporter: Qihong Chen
>Assignee: Qihong Chen
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the connector use single region.putAll() call to save each RDD partition. But 
> putAll() doesn't  handle big dataset well (such as 1M record). Need to split 
> the dataset into smaller chunks, and invoke putAll() for each chunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M record per partition)

2015-07-15 Thread Qihong Chen (JIRA)

Qihong Chen created GEODE-120:
-

 Summary: RDD.saveToGemfire() can not handle big dataset (1M record 
per partition)
 Key: GEODE-120
 URL: https://issues.apache.org/jira/browse/GEODE-120
 Project: Geode
  Issue Type: Sub-task
Affects Versions: 1.0.0-incubating
Reporter: Qihong Chen
Assignee: Qihong Chen


the connector use single region.putAll() call to save each RDD partition. But 
putAll() doesn't  handle big dataset well (such as 1M record). Need to split 
the dataset into smaller chunks, and invoke putAll() for each chunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

2015-07-14 Thread Qihong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/GEODE-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Chen updated GEODE-114:
--
Description: 
when multiple threads try to call getRegionProxy with the same region at the 
same time, the following exception was thrown:
com.gemstone.gemfire.cache.RegionExistsException: /debs
at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880)
at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835)
at 
com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223)
at 
io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87)
at 
io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47)
at 
io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
at 
io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

  was:
when multiple threads try to call getRegionProxy with the same region, it will 
throw following exception:
com.gemstone.gemfire.cache.RegionExistsException: /debs
at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880)
at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835)
at 
com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223)
at 
io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87)
at 
io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47)
at 
io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
at 
io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


> There's race condition in DefaultGemFireConnection.getRegionProxy
> -
>
> Key: GEODE-114
> URL: https://issues.apache.org/jira/browse/GEODE-114
> Project: Geode
>  Issue Type: Sub-task
>  Components: core, extensions
>Affects Versions: 1.0.0-incubating
>Reporter: Qihong Chen
>Assignee: Qihong Chen
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> when multiple threads try to call getRegionProxy with the same region at the 
> same time, the following exception was thrown:
> com.gemstone.gemfire.cache.RegionExistsException: /debs
> at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880)
> at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835)
> at 
> com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223)
> at 
> io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87)
> at 
> io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47)
> at 
> io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
> at 
> io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

2015-07-14 Thread Qihong Chen (JIRA)

Qihong Chen created GEODE-114:
-

 Summary: There's race condition in 
DefaultGemFireConnection.getRegionProxy
 Key: GEODE-114
 URL: https://issues.apache.org/jira/browse/GEODE-114
 Project: Geode
  Issue Type: Sub-task
Affects Versions: 1.0.0-incubating
Reporter: Qihong Chen
Assignee: Qihong Chen


when multiple threads try to call getRegionProxy with the same region, it will 
throw following exception:
com.gemstone.gemfire.cache.RegionExistsException: /debs
at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880)
at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835)
at 
com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223)
at 
io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87)
at 
io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47)
at 
io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
at 
io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 36408: Resolve the Geode Spark Connector Build Issue

2015-07-14 Thread Qihong Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36408/#review91625
---

Ship it!


Ship It!

- Qihong Chen


On July 10, 2015, 11:32 p.m., Jianxia Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36408/
> ---
> 
> (Updated July 10, 2015, 11:32 p.m.)
> 
> 
> Review request for geode, anilkumar gingade, Bruce Schuchardt, Jason Huynh, 
> and Qihong Chen.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> Remove the dependency on Pivotal's internal repo.
> 
> Resolve the conflict of different netty versions on Spark and Geode.
> 
> To build Geode Spark Connector, first build Geode and publish the jars to 
> local repo. Then you build the connector using sbt.
> 
> 
> Diffs
> -
> 
>   gemfire-spark-connector/doc/1_building.md ece4a9c 
>   gemfire-spark-connector/project/Dependencies.scala 899e182 
>   gemfire-spark-connector/project/Settings.scala ec61884 
> 
> Diff: https://reviews.apache.org/r/36408/diff/
> 
> 
> Testing
> ---
> 
> sbt test it:test
> 
> 
> Thanks,
> 
> Jianxia Chen
> 
>

Re: Where to place "Spark + GemFire" connector.

2015-07-06 Thread Qihong Chen

The problem is caused by multiple major dependencies and different release
cycles. Spark Geode Connector depends on two products: Spark and Geode (not
counting other dependencies), and Spark moves much faster than Geode, and
some features/code are not backward compatible.

Our initial connector implementation depends on Spark 1.2 in before the
last week of March 15. Then Spark 1.3 was released on the last week of
March, and some connector feature doesn't work with Spark 1.3, then we
moved on, and now support Spark 1.3 (but not 1.2 any more, we did create
tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector
code again.

Therefore, for each Geode release, we probably need multiple Connector
releases, and probably need to maintain last 2 or 3 Connector releases, for
example, we need to support both Spark 1.3 and 1.4 with the current Geode
code.

The question is how to support this with single source repository?

Thanks,
Qihong

[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

[jira] [Comment Edited] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

[jira] [Comment Edited] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

Re: Merging GEODE-9 Spark Connector project into develop

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project

[jira] [Created] (GEODE-137) Spark Connector: should connect to local GemFire server if possible

[jira] [Resolved] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M entries per partition)

[jira] [Closed] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

Re: Subscription request

[jira] [Resolved] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

[jira] [Updated] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M entries per partition)

Re: Unable to start locator locally

[jira] [Commented] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M record per partition)

[jira] [Created] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M record per partition)

[jira] [Updated] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

[jira] [Created] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy

Re: Review Request 36408: Resolve the Geode Spark Connector Build Issue

Re: Where to place "Spark + GemFire" connector.

23 matches

Site Navigation

Mail list logo

Footer information