frankgh commented on code in PR #41:
URL: 
https://github.com/apache/cassandra-analytics/pull/41#discussion_r1509715774


##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkWriterContext.java:
##########
@@ -21,7 +21,9 @@
 
 import java.io.Serializable;
 
-public interface BulkWriterContext extends Serializable
+import org.apache.cassandra.spark.common.Reportable;
+
+public interface BulkWriterContext extends Serializable, Reportable

Review Comment:
   I don't think this belongs in the `BulkWriterContext`. The context is 
something we share with executors. However the job stats are tracked from the 
driver only. I would avoid adding anything else to the BulkWriterContext 



##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/common/Reportable.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.cassandra.spark.common;

Review Comment:
   should this be in a stats or metrics package instead?



##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RingInstance.java:
##########
@@ -49,6 +49,7 @@ public RingInstance(ReplicaMetadata replica)
                          .datacenter(replica.datacenter())
                          .state(replica.state())
                          .status(replica.status())
+                         .token("")

Review Comment:
   why this change?



##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/CassandraBulkSourceRelation.java:
##########
@@ -107,17 +112,25 @@ private void persist(@NotNull JavaPairRDD<DecoratedKey, 
Object[]> sortedRDD, Str
     {
         try
         {
-            sortedRDD.foreachPartition(writeRowsInPartition(broadcastContext, 
columnNames));
+            List<StreamResult> results = sortedRDD
+                                         
.mapPartitions(partitionsFlatMapFunc(broadcastContext, columnNames))
+                                         .collect();
+            long rowCount = results.stream().mapToLong(res -> 
res.rowCount).sum();
+            long totalBytesWritten = results.stream().mapToLong(res -> 
res.bytesWritten).sum();
+            LOGGER.info("Bulk writer has written {} rows and {} bytes", 
rowCount, totalBytesWritten);
+            recordSuccessfulJobStats(rowCount, totalBytesWritten);
         }
         catch (Throwable throwable)
         {
+            recordFailureStats(throwable.getMessage());
             LOGGER.error("Bulk Write Failed", throwable);
             throw new RuntimeException("Bulk Write to Cassandra has failed", 
throwable);
         }
         finally
         {
             try
             {
+                writerContext.publishJobStats();

Review Comment:
   I don't think this will do what you think it will do. This object _should_ 
be treated as a read-only object from the point of view of the executors. Data 
won't propagate back to the driver once the executors complete



##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/common/Reportable.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.cassandra.spark.common;
+
+import java.util.Map;
+
+/**
+ * Interface to provide functionality to report Spark Job Statistics and/or 
properties
+ * that can optionally be instrumented. The default implementation merely logs 
these
+ * stats at the end of the job.
+ */
+public interface Reportable

Review Comment:
   How about naming this class `BulkWriterJobStats` ?



##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/RingInstance.java:
##########
@@ -125,40 +126,28 @@ private void writeObject(ObjectOutputStream out) throws 
IOException
         out.writeUTF(ringEntry.address());
         out.writeInt(ringEntry.port());
         out.writeUTF(ringEntry.datacenter());
-        out.writeUTF(ringEntry.load());

Review Comment:
   why do we remove these?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to