[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)
[ https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932616#comment-16932616 ] Qihong Chen commented on KAFKA-7500: Hi [~ryannedolan], Thanks for your quick response (y)(y)(y) Now I know what "primary" means. According to blog [A look inside Kafka MirrorMaker2|https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/] by Renu Tewari, {quote}In MM2 only one connect cluster is needed for all the cross-cluster replications between a pair of datacenters. Now if we simply take a Kafka source and sink connector and deploy them in tandem to do replication, the data would need to hop through an intermediate Kafka cluster. MM2 avoids this unnecessary data copying by a direct passthrough from source to sink. {quote} This is exactly what I want! Do you have release schedule for *SinkConnector*, and *direct passthrough from source to sink* feature? > MirrorMaker 2.0 (KIP-382) > - > > Key: KAFKA-7500 > URL: https://issues.apache.org/jira/browse/KAFKA-7500 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect, mirrormaker >Affects Versions: 2.4.0 >Reporter: Ryanne Dolan >Assignee: Manikumar >Priority: Major > Labels: pull-request-available, ready-to-commit > Fix For: 2.4.0 > > Attachments: Active-Active XDCR setup.png > > > Implement a drop-in replacement for MirrorMaker leveraging the Connect > framework. > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0] > [https://github.com/apache/kafka/pull/6295] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)
[ https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931743#comment-16931743 ] Qihong Chen edited comment on KAFKA-7500 at 9/17/19 7:22 PM: - Hi [~ryannedolan], just listened your Kafka Power Chat talk, Thanks! Have a follow up question about the last question (from me) you answered in the talk. You said you prefer dedicated MM2 cluster over running MM2 in connect cluster since you can use less number of clusters to do replications among multiple Kafka clusters. But there's no REST Api for a dedicated MM2 cluster that can provide the status of the replication streams, nor updating the replication configuration. Any changes to the configuration meaning update the config files and restart all MM2 instances, is that right? Or I missed it that dedicated MM2 cluster does provide REST API for admin and monitoring, if so, where is it? If my understanding is correct, we can archive the same thing with MM2 in connect cluster. Assume there are 3 Kafka clusters: A, B, and C. Set up a connect cluster against C (meaning all topics for connectors' data and states go to cluster C), then set up MM2 connectors to replicate data and metadata A -> B and B -> A. If this is correct, we can use the Kafka cluster C plus the connect cluster that running against Kafka cluster C to replicate data among more Kafka clusters, like A, B, and D, and even more. Of course, this needs more complicated configuration, which requires deeper understanding how the MM2 connectors work. In this scenario, the connect cluster provides REST API to admin and monitoring all the connectors. This will be useful for people can't use Stream Replication Manager from Cloudera or Kafka replicator from Confluent for some reason. Is this right? was (Author: qihong): Hi [~ryannedolan], just listened your Kafka Power Chat talk, Thanks! Have a follow up question about the last question (from me) you answered in the talk. You said you prefer dedicated MM2 cluster over running MM2 in connect cluster since you can use less number of clusters to do replications among multiple Kafka clusters. But there's no REST Api for a dedicated MM2 cluster that can provide the status of the replication streams, nor updating the replication configuration. Any changes to the configuration meaning update the config files and restart all MM2 instances, is that right? Or I missed it that dedicated MM2 cluster does provide REST API for admin and monitoring, if so, where is it? If my understanding is correct, we can archive the same thing with MM2 in connect cluster. Assume there are 3 Kafka clusters: A, B, and C. Set up a connect cluster against C (meaning all topics for connectors' data and states go to cluster C), then set up MM2 connectors to replicate data and metadata A -> B and B -> A. If this is correct, we can use the Kafka cluster C plus the connect cluster that running against Kafka cluster C to replicate data among more Kafka clusters, like A, B, and D, and even more. Of course, this needs more complicated configuration, which requires deeper understanding how the MM2 connectors work. In this scenario, the connect cluster provides REST API to admin and monitoring all the connectors. This will be useful for people can't use Stream Replication Manager from Cloudera or Kafka replicator from Confluent for some reason. Is this right? > MirrorMaker 2.0 (KIP-382) > - > > Key: KAFKA-7500 > URL: https://issues.apache.org/jira/browse/KAFKA-7500 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect, mirrormaker >Affects Versions: 2.4.0 >Reporter: Ryanne Dolan >Assignee: Manikumar >Priority: Major > Labels: pull-request-available, ready-to-commit > Fix For: 2.4.0 > > Attachments: Active-Active XDCR setup.png > > > Implement a drop-in replacement for MirrorMaker leveraging the Connect > framework. > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0] > [https://github.com/apache/kafka/pull/6295] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)
[ https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931743#comment-16931743 ] Qihong Chen commented on KAFKA-7500: Hi [~ryannedolan], just listened your Kafka Power Chat talk, Thanks! Have a follow up question about the last question (from me) you answered in the talk. You said you prefer dedicated MM2 cluster over running MM2 in connect cluster since you can use less number of clusters to do replications among multiple Kafka clusters. But there's no REST Api for a dedicated MM2 cluster that can provide the status of the replication streams, nor updating the replication configuration. Any changes to the configuration meaning update the config files and restart all MM2 instances, is that right? Or I missed it that dedicated MM2 cluster does provide REST API for admin and monitoring, if so, where is it? If my understanding is correct, we can archive the same thing with MM2 in connect cluster. Assume there are 3 Kafka clusters: A, B, and C. Set up a connect cluster against C (meaning all topics for connectors' data and states go to cluster C), then set up MM2 connectors to replicate data and metadata A -> B and B -> A. If this is correct, we can use the Kafka cluster C plus the connect cluster that running against Kafka cluster C to replicate data among more Kafka clusters, like A, B, and D, and even more. Of course, this needs more complicated configuration, which requires deeper understanding how the MM2 connectors work. In this scenario, the connect cluster provides REST API to admin and monitoring all the connectors. This will be useful for people can't use Stream Replication Manager from Cloudera or Kafka replicator from Confluent for some reason. Is this right? > MirrorMaker 2.0 (KIP-382) > - > > Key: KAFKA-7500 > URL: https://issues.apache.org/jira/browse/KAFKA-7500 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect, mirrormaker >Affects Versions: 2.4.0 >Reporter: Ryanne Dolan >Assignee: Manikumar >Priority: Major > Labels: pull-request-available, ready-to-commit > Fix For: 2.4.0 > > Attachments: Active-Active XDCR setup.png > > > Implement a drop-in replacement for MirrorMaker leveraging the Connect > framework. > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0] > [https://github.com/apache/kafka/pull/6295] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)
[ https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927863#comment-16927863 ] Qihong Chen edited comment on KAFKA-7500 at 9/11/19 6:13 PM: - [~ryannedolan] Thanks for all your efforts on MM2. Appreciate your input on the following questions. I'm trying to replicate topics from one cluster to another, include topic data and related consumers' offset. But I only see the topic data was replicated, not consumer offsets. Here's my mm2.properties {code:java} clusters = dr1, dr2 dr1.bootstrap.servers = 10.1.0.4:9092,10.1.0.5:9092,10.1.0.6:9092 dr2.bootstrap.servers = 10.2.0.4:9092,10.2.0.5:9092,10.2.0.6:9092 # only allow replication dr1 -> dr2 dr1->dr2.enabled = true dr1->dr2.topics = test.* dr1->dr2.groups = test.* dr1->dr2.emit.heartbeats.enabled = false dr2->dr1.enabled = false dr2->dr1.emit.heartbeats.enabled = false {code} Here's how I started MM2 cluster (dr2 as the nearby cluster) {code:java} nohup bin/connect-mirror-maker.sh mm2.properties --clusters dr2 > mm2.log 2>&1 & {code} On dr1, there's topic *test1*, and consumer group *test1grp* for topic test1. On dr2, I found following topics {code:java} __consumer_offsets dr1.checkpoints.internal dr1.test1 heartbeats mm2-configs.dr1.internal mm2-offsets.dr1.internal mm2-status.dr1.internal {code} But couldn't find any consumer groups on dr2 related to consumer group *test1grp*. Could you please let me know in detail how to migrate consumer group *test1grp* from dr1 to dr2?, i.e. what command(s) need to run to set up the offset for *test1grp* on dr2 before consume topic *dr1.test1* ? By the way, how to set up and run this in a Kafka connect cluster? i.e., how to set up MirrorSourceConnector, MirrorCheckpointConnector in a connect cluster? Is there document about this? was (Author: qihong): [~ryannedolan] Thanks for all your efforts on MM2. Appreciate your input on the following questions. I'm trying to replicate topics from one cluster to another, include topic data and related consumers' offset. But I only see the topic data was replicated, not consumer offsets. Here's my mm2.properties {code:java} clusters = dr1, dr2 dr1.bootstrap.servers = 10.1.0.4:9092,10.1.0.5:9092,10.1.0.6:9092 dr2.bootstrap.servers = 10.2.0.4:9092,10.2.0.5:9092,10.2.0.6:9092 # only allow replication dr1 -> dr2 dr1->dr2.enabled = true dr1->dr2.topics = test.* dr1->dr2.groups = test.* dr1->dr2.emit.heartbeats.enabled = false dr2->dr1.enabled = false dr2->dr1.emit.heartbeats.enabled = false {code} Here's how I started MM2 cluster (dr2 as the nearby cluster) {code:java} nohup bin/connect-mirror-maker.sh mm2.properties --clusters dr2 > mm2.log 2>&1 & {code} On dr1, there's topic *test1*, and consumer group *test1grp* for topic test1. On dr2, I found following topics {code:java} __consumer_offsets dr1.checkpoints.internal dr1.test1 heartbeats mm2-configs.dr1.internal mm2-offsets.dr1.internal mm2-status.dr1.internal {code} But couldn't find any consumer groups on dr2 related consumer group *test1grp*. Could you please let me know in detail how to migrate consumer group *test1grp* from dr1 to dr2?, i.e. what command(s) need to run to set up the offset for *test1grp* on dr2 before consume topic *dr1.test1* ? By the way, how to set up and run this in a Kafka connect cluster? i.e., how to set up MirrorSourceConnector, MirrorCheckpointConnector in a connect cluster? Is there document about this? > MirrorMaker 2.0 (KIP-382) > - > > Key: KAFKA-7500 > URL: https://issues.apache.org/jira/browse/KAFKA-7500 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect, mirrormaker >Affects Versions: 2.4.0 >Reporter: Ryanne Dolan >Assignee: Manikumar >Priority: Major > Labels: pull-request-available, ready-to-commit > Fix For: 2.4.0 > > Attachments: Active-Active XDCR setup.png > > > Implement a drop-in replacement for MirrorMaker leveraging the Connect > framework. > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0] > [https://github.com/apache/kafka/pull/6295] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)
[ https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927863#comment-16927863 ] Qihong Chen commented on KAFKA-7500: [~ryannedolan] Thanks for all your efforts on MM2. Appreciate your input on the following questions. I'm trying to replicate topics from one cluster to another, include topic data and related consumers' offset. But I only see the topic data was replicated, not consumer offsets. Here's my mm2.properties {code:java} clusters = dr1, dr2 dr1.bootstrap.servers = 10.1.0.4:9092,10.1.0.5:9092,10.1.0.6:9092 dr2.bootstrap.servers = 10.2.0.4:9092,10.2.0.5:9092,10.2.0.6:9092 # only allow replication dr1 -> dr2 dr1->dr2.enabled = true dr1->dr2.topics = test.* dr1->dr2.groups = test.* dr1->dr2.emit.heartbeats.enabled = false dr2->dr1.enabled = false dr2->dr1.emit.heartbeats.enabled = false {code} Here's how I started MM2 cluster (dr2 as the nearby cluster) {code:java} nohup bin/connect-mirror-maker.sh mm2.properties --clusters dr2 > mm2.log 2>&1 & {code} On dr1, there's topic *test1*, and consumer group *test1grp* for topic test1. On dr2, I found following topics {code:java} __consumer_offsets dr1.checkpoints.internal dr1.test1 heartbeats mm2-configs.dr1.internal mm2-offsets.dr1.internal mm2-status.dr1.internal {code} But couldn't find any consumer groups on dr2 related consumer group *test1grp*. Could you please let me know in detail how to migrate consumer group *test1grp* from dr1 to dr2?, i.e. what command(s) need to run to set up the offset for *test1grp* on dr2 before consume topic *dr1.test1* ? By the way, how to set up and run this in a Kafka connect cluster? i.e., how to set up MirrorSourceConnector, MirrorCheckpointConnector in a connect cluster? Is there document about this? > MirrorMaker 2.0 (KIP-382) > - > > Key: KAFKA-7500 > URL: https://issues.apache.org/jira/browse/KAFKA-7500 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect, mirrormaker >Affects Versions: 2.4.0 >Reporter: Ryanne Dolan >Assignee: Manikumar >Priority: Major > Labels: pull-request-available, ready-to-commit > Fix For: 2.4.0 > > Attachments: Active-Active XDCR setup.png > > > Implement a drop-in replacement for MirrorMaker leveraging the Connect > framework. > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0] > [https://github.com/apache/kafka/pull/6295] -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: Merging GEODE-9 Spark Connector project into develop
+1 to merge On Fri, Jul 24, 2015 at 9:54 AM, Dan Smith wrote: > +1 to merge. > > -Dan > > On Thu, Jul 23, 2015 at 11:44 PM, Jianxia Chen wrote: > > > +1 for merge > > > > On Thu, Jul 23, 2015 at 4:27 PM, Anilkumar Gingade > > wrote: > > > > > +1 for merge. > > > > > > On Thu, Jul 23, 2015 at 10:19 AM, Jason Huynh > wrote: > > > > > > > Greetings, > > > > > > > > We are hoping to merge in GEODE-9 to develop. GEODE-9 is the feature > > > work > > > > for the gemfire/geode- spark connector. This work had previously > been > > > done > > > > on a private repo prior to Geode being in incubation and is quite > > large. > > > > > > > > This merge will create a sub project in Geode named > > > > gemfire-spark-connector. This project uses sbt to do it's build and > > has > > > > not yet been connected to the Geode build system. There will be > future > > > > work to better incorporate it with the gradle as well as removing the > > > geode > > > > jar dependency. > > > > > > > > This project has a separate set of readme/tutorial docs as well as > it's > > > own > > > > tests and integration tests. These also have not been integrated > with > > > the > > > > automated testing and will need to be included at some point. > > > > > > > > The hope was to get this merged in and do the remaining work in > > smaller, > > > > easier to digest chunks as well as possibly getting other > contributors > > > > helping with these efforts. > > > > > > > > Currently there is a review for this entire change at: > > > > https://reviews.apache.org/r/36731/ > > > > It will probably be easier to just get a checkout of the branch to > see > > > what > > > > it looks like. > > > > > > > > Please voice any concerns, suggestions or questions on this thread. > > > > > > > > Thanks! > > > > > > > > > >
Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project
forgot to ask, do we need .gitignore file? there's only one line of "/bin/" in it. Thanks!
Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36731/#review92775 --- Ship it! Ship It! - Qihong Chen On July 23, 2015, 4:29 p.m., Jason Huynh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/36731/ > --- > > (Updated July 23, 2015, 4:29 p.m.) > > > Review request for geode, anilkumar gingade, Bruce Schuchardt, Jianxia Chen, > Lynn Gallinat, and Qihong Chen. > > > Repository: geode > > > Description > --- > > The diff is of all the files in the GEODE-9 feature branch. It will probably > be easier just to get a checkout of the feature branch and peruse the > gemfire-spark-connector directory. > We hope to merge this in and make future changes to fully integrate the > connector into the gradle build at a future time. > > > Diffs > - > > gemfire-spark-connector/.gitignore PRE-CREATION > gemfire-spark-connector/README.md PRE-CREATION > gemfire-spark-connector/doc/10_demos.md PRE-CREATION > gemfire-spark-connector/doc/1_building.md PRE-CREATION > gemfire-spark-connector/doc/2_quick.md PRE-CREATION > gemfire-spark-connector/doc/3_connecting.md PRE-CREATION > gemfire-spark-connector/doc/4_loading.md PRE-CREATION > gemfire-spark-connector/doc/5_rdd_join.md PRE-CREATION > gemfire-spark-connector/doc/6_save_rdd.md PRE-CREATION > gemfire-spark-connector/doc/7_save_dstream.md PRE-CREATION > gemfire-spark-connector/doc/8_oql.md PRE-CREATION > gemfire-spark-connector/doc/9_java_api.md PRE-CREATION > > gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/RegionMetadata.java > PRE-CREATION > > gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/QueryFunction.java > PRE-CREATION > > gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/RetrieveRegionFunction.java > PRE-CREATION > > gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/RetrieveRegionMetadataFunction.java > PRE-CREATION > > gemfire-spark-connector/gemfire-functions/src/main/java/io/pivotal/gemfire/spark/connector/internal/gemfirefunctions/StructStreamingResultSender.java > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/Employee.java > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/JavaApiIntegrationTest.java > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/Portfolio.java > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/java/ittest/io/pivotal/gemfire/spark/connector/Position.java > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/resources/test-regions.xml > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/resources/test-retrieve-regions.xml > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/BasicIntegrationTest.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/RDDJoinRegionIntegrationTest.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/RetrieveRegionIntegrationTest.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/package.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/testkit/GemFireCluster.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/testkit/GemFireRunner.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/ittest/io/pivotal/gemfire/spark/connector/testkit/IOUtils.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/src/it/scala/org/apache/spark/streaming/ManualClockHelper.scala > PRE-CREATION > > gemfire-spark-connector/gemfire-spark-connector/
Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project
I'm not talking about changes to GEODE-9 feature branch. Just want to know if this merge is just "copy-ing" all files from GEODE-9 feature branch to develop branch? Thanks!
Re: Review Request 36731: GEODE-9 : Merging in spark connector sub project
Hi Jason, Is there any "real" changes from GEODE-9 feature branch except adding all files to develop branch? Thanks, Qihong
[jira] [Created] (GEODE-137) Spark Connector: should connect to local GemFire server if possible
Qihong Chen created GEODE-137: - Summary: Spark Connector: should connect to local GemFire server if possible Key: GEODE-137 URL: https://issues.apache.org/jira/browse/GEODE-137 Project: Geode Issue Type: Bug Reporter: Qihong Chen Assignee: Qihong Chen DefaultGemFireConnection uses ClientCacheFactory with locator info to create ClientCache instance. In this case, the ClientCache doesn't connect to the GemFire/Geode server on the same host if there's one. This cause more network traffic and less efficient. ClientCacheFactory can create ClientCache based on GemFire server(s) info as well. Therefore, we can force the ClientCache connects to local GemFire server if possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M entries per partition)
[ https://issues.apache.org/jira/browse/GEODE-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Chen resolved GEODE-120. --- Resolution: Fixed > RDD.saveToGemfire() can not handle big dataset (1M entries per partition) > - > > Key: GEODE-120 > URL: https://issues.apache.org/jira/browse/GEODE-120 > Project: Geode > Issue Type: Sub-task > Components: core, extensions >Affects Versions: 1.0.0-incubating > Reporter: Qihong Chen >Assignee: Qihong Chen > Original Estimate: 48h > Remaining Estimate: 48h > > the connector use single region.putAll() call to save each RDD partition. But > putAll() doesn't handle big dataset well (such as 1M record). Need to split > the dataset into smaller chunks, and invoke putAll() for each chunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy
[ https://issues.apache.org/jira/browse/GEODE-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Chen closed GEODE-114. - > There's race condition in DefaultGemFireConnection.getRegionProxy > - > > Key: GEODE-114 > URL: https://issues.apache.org/jira/browse/GEODE-114 > Project: Geode > Issue Type: Sub-task > Components: core, extensions >Affects Versions: 1.0.0-incubating >Reporter: Qihong Chen >Assignee: Qihong Chen > Fix For: 1.0.0-incubating > > Original Estimate: 24h > Remaining Estimate: 24h > > when multiple threads try to call getRegionProxy with the same region at the > same time, the following exception was thrown: > com.gemstone.gemfire.cache.RegionExistsException: /debs > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880) > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835) > at > com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223) > at > io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87) > at > io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47) > at > io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) > at > io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Subscription request
send an empty email to user*-subscribe*@geode.incubator.apache.org
[jira] [Resolved] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy
[ https://issues.apache.org/jira/browse/GEODE-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Chen resolved GEODE-114. --- Resolution: Fixed Fix Version/s: 1.0.0-incubating > There's race condition in DefaultGemFireConnection.getRegionProxy > - > > Key: GEODE-114 > URL: https://issues.apache.org/jira/browse/GEODE-114 > Project: Geode > Issue Type: Sub-task > Components: core, extensions >Affects Versions: 1.0.0-incubating >Reporter: Qihong Chen >Assignee: Qihong Chen > Fix For: 1.0.0-incubating > > Original Estimate: 24h > Remaining Estimate: 24h > > when multiple threads try to call getRegionProxy with the same region at the > same time, the following exception was thrown: > com.gemstone.gemfire.cache.RegionExistsException: /debs > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880) > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835) > at > com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223) > at > io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87) > at > io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47) > at > io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) > at > io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M entries per partition)
[ https://issues.apache.org/jira/browse/GEODE-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Chen updated GEODE-120: -- Summary: RDD.saveToGemfire() can not handle big dataset (1M entries per partition) (was: RDD.saveToGemfire() can not handle big dataset (1M record per partition)) > RDD.saveToGemfire() can not handle big dataset (1M entries per partition) > - > > Key: GEODE-120 > URL: https://issues.apache.org/jira/browse/GEODE-120 > Project: Geode > Issue Type: Sub-task > Components: core, extensions >Affects Versions: 1.0.0-incubating > Reporter: Qihong Chen >Assignee: Qihong Chen > Original Estimate: 48h > Remaining Estimate: 48h > > the connector use single region.putAll() call to save each RDD partition. But > putAll() doesn't handle big dataset well (such as 1M record). Need to split > the dataset into smaller chunks, and invoke putAll() for each chunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Unable to start locator locally
Make sure JAVA_HOME is set for the terminal you used for gfsh. I got the similar error the other day. Remember to kill all Geode processes before try again. Qihong
[jira] [Commented] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M record per partition)
[ https://issues.apache.org/jira/browse/GEODE-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628902#comment-14628902 ] Qihong Chen commented on GEODE-120: --- post code review request: https://reviews.apache.org/r/36530/ > RDD.saveToGemfire() can not handle big dataset (1M record per partition) > > > Key: GEODE-120 > URL: https://issues.apache.org/jira/browse/GEODE-120 > Project: Geode > Issue Type: Sub-task > Components: core, extensions >Affects Versions: 1.0.0-incubating >Reporter: Qihong Chen >Assignee: Qihong Chen > Original Estimate: 48h > Remaining Estimate: 48h > > the connector use single region.putAll() call to save each RDD partition. But > putAll() doesn't handle big dataset well (such as 1M record). Need to split > the dataset into smaller chunks, and invoke putAll() for each chunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GEODE-120) RDD.saveToGemfire() can not handle big dataset (1M record per partition)
Qihong Chen created GEODE-120: - Summary: RDD.saveToGemfire() can not handle big dataset (1M record per partition) Key: GEODE-120 URL: https://issues.apache.org/jira/browse/GEODE-120 Project: Geode Issue Type: Sub-task Affects Versions: 1.0.0-incubating Reporter: Qihong Chen Assignee: Qihong Chen the connector use single region.putAll() call to save each RDD partition. But putAll() doesn't handle big dataset well (such as 1M record). Need to split the dataset into smaller chunks, and invoke putAll() for each chunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy
[ https://issues.apache.org/jira/browse/GEODE-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qihong Chen updated GEODE-114: -- Description: when multiple threads try to call getRegionProxy with the same region at the same time, the following exception was thrown: com.gemstone.gemfire.cache.RegionExistsException: /debs at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835) at com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223) at io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87) at io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47) at io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) at io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) was: when multiple threads try to call getRegionProxy with the same region, it will throw following exception: com.gemstone.gemfire.cache.RegionExistsException: /debs at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835) at com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223) at io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87) at io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47) at io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) at io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) > There's race condition in DefaultGemFireConnection.getRegionProxy > - > > Key: GEODE-114 > URL: https://issues.apache.org/jira/browse/GEODE-114 > Project: Geode > Issue Type: Sub-task > Components: core, extensions >Affects Versions: 1.0.0-incubating >Reporter: Qihong Chen >Assignee: Qihong Chen > Original Estimate: 24h > Remaining Estimate: 24h > > when multiple threads try to call getRegionProxy with the same region at the > same time, the following exception was thrown: > com.gemstone.gemfire.cache.RegionExistsException: /debs > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880) > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835) > at > com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223) > at > io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87) > at > io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47) > at > io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) > at > io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GEODE-114) There's race condition in DefaultGemFireConnection.getRegionProxy
Qihong Chen created GEODE-114: - Summary: There's race condition in DefaultGemFireConnection.getRegionProxy Key: GEODE-114 URL: https://issues.apache.org/jira/browse/GEODE-114 Project: Geode Issue Type: Sub-task Affects Versions: 1.0.0-incubating Reporter: Qihong Chen Assignee: Qihong Chen when multiple threads try to call getRegionProxy with the same region, it will throw following exception: com.gemstone.gemfire.cache.RegionExistsException: /debs at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:2880) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2835) at com.gemstone.gemfire.cache.client.internal.ClientRegionFactoryImpl.create(ClientRegionFactoryImpl.java:223) at io.pivotal.gemfire.spark.connector.internal.DefaultGemFireConnection.getRegionProxy(DefaultGemFireConnection.scala:87) at io.pivotal.gemfire.spark.connector.internal.rdd.GemFirePairRDDWriter.write(GemFireRDDWriter.scala:47) at io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) at io.pivotal.gemfire.spark.connector.GemFirePairRDDFunctions$$anonfun$saveToGemfire$2.apply(GemFirePairRDDFunctions.scala:24) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 36408: Resolve the Geode Spark Connector Build Issue
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36408/#review91625 --- Ship it! Ship It! - Qihong Chen On July 10, 2015, 11:32 p.m., Jianxia Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/36408/ > --- > > (Updated July 10, 2015, 11:32 p.m.) > > > Review request for geode, anilkumar gingade, Bruce Schuchardt, Jason Huynh, > and Qihong Chen. > > > Repository: geode > > > Description > --- > > Remove the dependency on Pivotal's internal repo. > > Resolve the conflict of different netty versions on Spark and Geode. > > To build Geode Spark Connector, first build Geode and publish the jars to > local repo. Then you build the connector using sbt. > > > Diffs > - > > gemfire-spark-connector/doc/1_building.md ece4a9c > gemfire-spark-connector/project/Dependencies.scala 899e182 > gemfire-spark-connector/project/Settings.scala ec61884 > > Diff: https://reviews.apache.org/r/36408/diff/ > > > Testing > --- > > sbt test it:test > > > Thanks, > > Jianxia Chen > >
Re: Where to place "Spark + GemFire" connector.
The problem is caused by multiple major dependencies and different release cycles. Spark Geode Connector depends on two products: Spark and Geode (not counting other dependencies), and Spark moves much faster than Geode, and some features/code are not backward compatible. Our initial connector implementation depends on Spark 1.2 in before the last week of March 15. Then Spark 1.3 was released on the last week of March, and some connector feature doesn't work with Spark 1.3, then we moved on, and now support Spark 1.3 (but not 1.2 any more, we did create tag). Two weeks ago, Spark 1.4 was released, and it breaks our connector code again. Therefore, for each Geode release, we probably need multiple Connector releases, and probably need to maintain last 2 or 3 Connector releases, for example, we need to support both Spark 1.3 and 1.4 with the current Geode code. The question is how to support this with single source repository? Thanks, Qihong