weibozhao commented on code in PR #156:
URL: https://github.com/apache/flink-ml/pull/156#discussion_r978481060


##########
flink-ml-lib/src/main/java/org/apache/flink/ml/feature/vectorassembler/VectorAssembler.java:
##########
@@ -74,38 +74,65 @@ public Table[] transform(Table... inputs) {
         DataStream<Row> output =
                 tEnv.toDataStream(inputs[0])
                         .flatMap(
-                                new AssemblerFunc(getInputCols(), 
getHandleInvalid()),
+                                new AssemblerFunction(
+                                        getInputCols(), getHandleInvalid(), 
getSizes()),
                                 outputTypeInfo);
         Table outputTable = tEnv.fromDataStream(output);
         return new Table[] {outputTable};
     }
 
-    private static class AssemblerFunc implements FlatMapFunction<Row, Row> {
+    private static class AssemblerFunction implements FlatMapFunction<Row, 
Row> {
         private final String[] inputCols;
         private final String handleInvalid;
+        private final int[] sizeArray;
 
-        public AssemblerFunc(String[] inputCols, String handleInvalid) {
+        public AssemblerFunction(String[] inputCols, String handleInvalid, 
int[] sizeArray) {
             this.inputCols = inputCols;
             this.handleInvalid = handleInvalid;
+            this.sizeArray = sizeArray;
         }
 
         @Override
         public void flatMap(Row value, Collector<Row> out) {
             int nnz = 0;
             int vectorSize = 0;
             try {
-                for (String inputCol : inputCols) {
+                for (int i = 0; i < inputCols.length; ++i) {
+                    String inputCol = inputCols[i];
                     Object object = value.getField(inputCol);
                     Preconditions.checkNotNull(object, "Input column value 
should not be null.");
                     if (object instanceof Number) {
+                        Preconditions.checkArgument(

Review Comment:
   I don't know which record has the error size, then I must check the sizes 
for every record. 
   When the sizes are error, the code also can run OK, that's why we need to 
check every record.   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to