[GitHub] jon-wei commented on a change in pull request #6360: overhaul 'druid-parquet-extensions' module, promoting from 'contrib' to 'core'

GitBox Thu, 11 Oct 2018 16:28:33 -0700

jon-wei commented on a change in pull request #6360: overhaul 
'druid-parquet-extensions' module, promoting from 'contrib' to 'core'
URL: https://github.com/apache/incubator-druid/pull/6360#discussion_r224633964


 ##########
 File path: 
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/DruidParquetReadSupport.java
 ##########
 @@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.parquet.simple;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import org.apache.druid.data.input.impl.DimensionSchema;
+import org.apache.druid.data.input.impl.ParseSpec;
+import org.apache.druid.indexer.HadoopDruidIndexerConfig;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.parquet.hadoop.api.InitContext;
+import org.apache.parquet.hadoop.example.GroupReadSupport;
+import org.apache.parquet.schema.MessageType;
+import org.apache.parquet.schema.Type;
+
+import java.util.List;
+import java.util.Set;
+
+public class DruidParquetReadSupport extends GroupReadSupport
+{
+  /**
+   * Select the columns from the parquet schema that are used in the schema of 
the ingestion job
+   *
+   * @param context The context of the file to be read
+   *
+   * @return the partial schema that only contains the columns that are being 
used in the schema
+   */
+  private MessageType getPartialReadSchema(InitContext context)
+  {
+    MessageType fullSchema = context.getFileSchema();
+
+    String name = fullSchema.getName();
+
+    HadoopDruidIndexerConfig config = 
HadoopDruidIndexerConfig.fromConfiguration(context.getConfiguration());
+    ParseSpec parseSpec = config.getParser().getParseSpec();
+
+    // todo: this is kind of lame, maybe we can still trim what we read if we
 
 Review comment:
   hm, rather than parsing the flattenspec, maybe this could be supported with 
a "requiredFields" method on flatten specs, but I would remove the "todo" part 
for now

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] jon-wei commented on a change in pull request #6360: overhaul 'druid-parquet-extensions' module, promoting from 'contrib' to 'core'

Reply via email to