[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274702#comment-16274702
 ] 

ASF GitHub Bot commented on APEXMALHAR-2550:
--------------------------------------------

tweise closed pull request #680: APEXMALHAR-2550 Made NycTaxiDataReader and 
NycTaxiCsvParser more resiā€¦
URL: https://github.com/apache/apex-malhar/pull/680
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiCsvParser.java
 
b/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiCsvParser.java
index 3e13e7640b..7ecf62a844 100644
--- 
a/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiCsvParser.java
+++ 
b/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiCsvParser.java
@@ -21,6 +21,9 @@
 import java.util.HashMap;
 import java.util.Map;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import org.apache.commons.lang3.StringUtils;
 
 import com.datatorrent.api.DefaultInputPort;
@@ -40,17 +43,20 @@
     @Override
     public void process(String tuple)
     {
-      String[] values = tuple.split(",");
+      String[] values = tuple.split(",", -1);
       Map<String, String> outputTuple = new HashMap<>();
-      if (StringUtils.isNumeric(values[0])) {
+      if (values.length > 18 && StringUtils.isNumeric(values[0])) {
         outputTuple.put("pickup_time", values[1]);
         outputTuple.put("pickup_lon", values[5]);
         outputTuple.put("pickup_lat", values[6]);
         outputTuple.put("total_fare", values[18]);
         output.emit(outputTuple);
+      } else {
+        LOG.warn("Dropping tuple with unrecognized format: {}", tuple);
       }
     }
   };
 
   public final transient DefaultOutputPort<Map<String, String>> output = new 
DefaultOutputPort<>();
+  private static final Logger LOG = 
LoggerFactory.getLogger(NycTaxiCsvParser.class);
 }
diff --git 
a/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiDataReader.java
 
b/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiDataReader.java
index b2168f458e..1d4114cba3 100644
--- 
a/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiDataReader.java
+++ 
b/examples/nyctaxi/src/main/java/org/apache/apex/examples/nyctaxi/NycTaxiDataReader.java
@@ -54,14 +54,16 @@ protected boolean suspendEmit()
   protected String readEntity() throws IOException
   {
     String line = super.readEntity();
-    String[] fields = line.split(",");
-    String timestamp = fields[1];
-    if (currentTimestamp == null) {
-      currentTimestamp = timestamp;
-    } else if (timestamp != currentTimestamp) {
-      // suspend emit until the next streaming window when timestamp is 
different from the current timestamp.
-      suspendEmit = true;
-      currentTimestamp = timestamp;
+    String[] fields = line.split(",", -1);
+    if (fields.length > 1) {
+      String timestamp = fields[1];
+      if (currentTimestamp == null) {
+        currentTimestamp = timestamp;
+      } else if (timestamp != currentTimestamp) {
+        // suspend emit until the next streaming window when timestamp is 
different from the current timestamp.
+        suspendEmit = true;
+        currentTimestamp = timestamp;
+      }
     }
     return line;
   }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NycTaxiDataReader throws exception when encountering lines with unrecognized 
> format in the NYC taxi example
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2550
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2550
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: David Yan
>            Assignee: David Yan
>
> 17/11/28 16:21:42 ERROR engine.StreamingContainer: Operator set 
> [OperatorDeployInfo[id=1,name=NycTaxiDataReader,type=INPUT,checkpoint={ffffffffffffffff,
>  0, 
> 0},inputs=[],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=output,streamId=input_to_parser,bufferServer=localhost]]]]
>  stopped running due to an exception.
> java.lang.ArrayIndexOutOfBoundsException: 1
>       at 
> org.apache.apex.examples.nyctaxi.NycTaxiDataReader.readEntity(NycTaxiDataReader.java:58)
>       at 
> org.apache.apex.examples.nyctaxi.NycTaxiDataReader.readEntity(NycTaxiDataReader.java:34)
>       at 
> com.datatorrent.lib.io.fs.AbstractFileInputOperator.emitTuples(AbstractFileInputOperator.java:684)
>       at com.datatorrent.stram.engine.InputNode.run(InputNode.java:124)
> and
> ask=0,partitionKeys=<null>]],outputs=[OperatorDeployInfo.OutputDeployInfo[portName=output,streamId=parser_to_extractor,bufferServer=localhost]]]]
>  stopped running due to an exception.
> java.lang.ArrayIndexOutOfBoundsException: 18
> at 
> org.apache.apex.examples.nyctaxi.NycTaxiCsvParser$1.process(NycTaxiCsvParser.java:49)
> at 
> org.apache.apex.examples.nyctaxi.NycTaxiCsvParser$1.process(NycTaxiCsvParser.java:39)
> at com.datatorrent.api.DefaultInputPort.put(DefaultInputPort.java:79)
> at 
> com.datatorrent.stram.stream.BufferServerSubscriber$BufferReservoir.sweep(BufferServerSubscriber.java:288)
> at com.datatorrent.stram.engine.GenericNode.run(GenericNode.java:269)
> at 
> com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1428)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to