[jira] [Commented] (HAWQ-1512) Check Apache HAWQ mandatory libraries to match LC20, LC30 license criteria

2018-02-05 Thread Radar Lei (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352435#comment-16352435
 ] 

Radar Lei commented on HAWQ-1512:
-

Thanks [~yjin] AND [~huor] 's review.

Already added googlemock.

For Ranger dependences, as Ruilong's comment, they are not mandatory so they 
will not be added.

For libhdfs3 I think the previous open source place is retired and the latest 
code only exist in HAWQ, so we do not need to add it. Libyarn is similar case.

Another one I want to mention here is "libgsasl" which is using LGPL, I did not 
add it because it can be treat as a system dependence. See previous discuss 
email: 
https://lists.apache.org/thread.html/5ae122b59529de58c5c668fa0e703a53ad9efb0fddb0fb26ecbcace8@%3Cdev.hawq.apache.org%3E

 

> Check Apache HAWQ mandatory libraries to match LC20, LC30 license criteria
> --
>
> Key: HAWQ-1512
> URL: https://issues.apache.org/jira/browse/HAWQ-1512
> Project: Apache HAWQ
>  Issue Type: Task
>  Components: Build
>Reporter: Yi Jin
>Assignee: Radar Lei
>Priority: Major
> Fix For: 2.3.0.0-incubating
>
> Attachments: HAWQ Ranger Pluggin Service Dependencies.xlsx
>
>
> Check Apache HAWQ mandatory libraries to match LC20, LC30 license criteria
> Check the following page for the criteria
> https://cwiki.apache.org/confluence/display/HAWQ/ASF+Maturity+Evaluation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] incubator-hawq issue #1335: Add setting for cloudera manager

2018-02-05 Thread lavjain
Github user lavjain commented on the issue:

https://github.com/apache/incubator-hawq/pull/1335
  
@yk-st Your change looks good. However, it might be cleaner to create a 
separate template for CDH manager (eg 
`pxf-private-cdh-manager.classpath.template`) rather than substituting the 
token values. This will also ensure that PXF pipelines for GPDB are not 
affected.


---


[GitHub] incubator-hawq pull request #1334: HAWQ-1584. Don't ignore exceptions during...

2018-02-05 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1334#discussion_r166099466
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/WritableResource.java
 ---
@@ -143,36 +142,38 @@ private static synchronized Response 
synchronizedWriteResponse(Bridge bridge,
 
 private static Response writeResponse(Bridge bridge,
   String path,
-  InputStream inputStream) throws 
Exception {
-
-String returnMsg;
-
+  InputStream inputStream)
+throws Exception {
 // Open the output file
 bridge.beginIteration();
-
 long totalWritten = 0;
+Exception ex = null;
 
 // dataStream will close automatically in the end of the try.
 // inputStream is closed by dataStream.close().
 try (DataInputStream dataStream = new 
DataInputStream(inputStream)) {
 while (bridge.setNext(dataStream)) {
 ++totalWritten;
 }
-} catch (ClientAbortException e) {
-LOG.debug("Remote connection closed by HAWQ", e);
-} catch (Exception ex) {
-LOG.debug("totalWritten so far " + totalWritten + " to " + 
path);
-throw ex;
+} catch (ClientAbortException cae) {
+LOG.error("Remote connection closed by HAWQ", cae);
+} catch (Exception e) {
+LOG.error("Exception: totalWritten so far " + totalWritten + " 
to " + path, e);
+ex = e;
 } finally {
 try {
 bridge.endIteration();
 } catch (Exception e) {
-// ignore ... any significant errors should already have 
been handled
+if (ex == null)
+ex = e;
--- End diff --

Another way would be to preserve throwing the exception inside original 
catch block (line 162), then here you would say
```
if (ex == null) 
 throw e;
 else 
 throw ex;
``` 
and you would not need the block below (lines 170-172) as the original will 
still be thrown if endIterations() completes without an error.


---


[GitHub] incubator-hawq pull request #1334: HAWQ-1584. Don't ignore exceptions during...

2018-02-05 Thread denalex
Github user denalex commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1334#discussion_r166099856
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/WritableResource.java
 ---
@@ -143,36 +142,38 @@ private static synchronized Response 
synchronizedWriteResponse(Bridge bridge,
 
 private static Response writeResponse(Bridge bridge,
   String path,
-  InputStream inputStream) throws 
Exception {
-
-String returnMsg;
-
+  InputStream inputStream)
+throws Exception {
 // Open the output file
 bridge.beginIteration();
-
 long totalWritten = 0;
+Exception ex = null;
 
 // dataStream will close automatically in the end of the try.
 // inputStream is closed by dataStream.close().
 try (DataInputStream dataStream = new 
DataInputStream(inputStream)) {
 while (bridge.setNext(dataStream)) {
 ++totalWritten;
 }
-} catch (ClientAbortException e) {
-LOG.debug("Remote connection closed by HAWQ", e);
-} catch (Exception ex) {
-LOG.debug("totalWritten so far " + totalWritten + " to " + 
path);
-throw ex;
+} catch (ClientAbortException cae) {
+LOG.error("Remote connection closed by HAWQ", cae);
+} catch (Exception e) {
+LOG.error("Exception: totalWritten so far " + totalWritten + " 
to " + path, e);
+ex = e;
 } finally {
 try {
 bridge.endIteration();
 } catch (Exception e) {
-// ignore ... any significant errors should already have 
been handled
+if (ex == null)
+ex = e;
 }
+// propagate any exceptions
+if (ex != null)
+throw ex;
 }
 
 String censuredPath = Utilities.maskNonPrintables(path);
-returnMsg = "wrote " + totalWritten + " bulks to " + censuredPath;
+String returnMsg = "wrote " + totalWritten + " bulks to " + 
censuredPath;
--- End diff --

move string concat inside LOG.isDebugEnabled()


---


[GitHub] incubator-hawq pull request #1334: HAWQ-1584. Don't ignore exceptions during...

2018-02-05 Thread lavjain
Github user lavjain commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1334#discussion_r166122917
  
--- Diff: 
pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/WritableResource.java
 ---
@@ -143,36 +142,38 @@ private static synchronized Response 
synchronizedWriteResponse(Bridge bridge,
 
 private static Response writeResponse(Bridge bridge,
   String path,
-  InputStream inputStream) throws 
Exception {
-
-String returnMsg;
-
+  InputStream inputStream)
+throws Exception {
 // Open the output file
 bridge.beginIteration();
-
 long totalWritten = 0;
+Exception ex = null;
 
 // dataStream will close automatically in the end of the try.
 // inputStream is closed by dataStream.close().
 try (DataInputStream dataStream = new 
DataInputStream(inputStream)) {
 while (bridge.setNext(dataStream)) {
 ++totalWritten;
 }
-} catch (ClientAbortException e) {
-LOG.debug("Remote connection closed by HAWQ", e);
-} catch (Exception ex) {
-LOG.debug("totalWritten so far " + totalWritten + " to " + 
path);
-throw ex;
+} catch (ClientAbortException cae) {
+LOG.error("Remote connection closed by HAWQ", cae);
+} catch (Exception e) {
+LOG.error("Exception: totalWritten so far " + totalWritten + " 
to " + path, e);
+ex = e;
 } finally {
 try {
 bridge.endIteration();
 } catch (Exception e) {
-// ignore ... any significant errors should already have 
been handled
+if (ex == null)
+ex = e;
 }
+// propagate any exceptions
+if (ex != null)
+throw ex;
 }
 
 String censuredPath = Utilities.maskNonPrintables(path);
-returnMsg = "wrote " + totalWritten + " bulks to " + censuredPath;
+String returnMsg = "wrote " + totalWritten + " bulks to " + 
censuredPath;
--- End diff --

returnMsg is also being used in the response.


---


[GitHub] incubator-hawq pull request #1326: HAWQ-1575. Implemented readable Parquet p...

2018-02-05 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1326#discussion_r166173537
  
--- Diff: 
pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/utilities/HdfsUtilities.java
 ---
@@ -151,18 +153,42 @@ public static boolean isThreadSafe(String dataDir, 
String compCodec) {
  * @param fsp file split to be serialized
  * @return byte serialization of fsp
  * @throws IOException if I/O errors occur while writing to the 
underlying
- * stream
+ * stream
  */
 public static byte[] prepareFragmentMetadata(FileSplit fsp)
 throws IOException {
-ByteArrayOutputStream byteArrayStream = new 
ByteArrayOutputStream();
-ObjectOutputStream objectStream = new ObjectOutputStream(
-byteArrayStream);
-objectStream.writeLong(fsp.getStart());
-objectStream.writeLong(fsp.getLength());
-objectStream.writeObject(fsp.getLocations());
+
+return prepareFragmentMetadata(fsp.getStart(), fsp.getLength(), 
fsp.getLocations());
+
+}
+
+public static byte[] prepareFragmentMetadata(long start, long length, 
String[] locations)
--- End diff --

Both functions are used so I would rather keep them both for sake of 
compatibility.


---


[GitHub] incubator-hawq pull request #1326: HAWQ-1575. Implemented readable Parquet p...

2018-02-05 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1326#discussion_r166173757
  
--- Diff: 
pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/ParquetFileAccessor.java
 ---
@@ -0,0 +1,168 @@
+package org.apache.hawq.pxf.plugins.hdfs;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+
+import org.apache.parquet.column.page.PageReadStore;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.simple.convert.GroupRecordConverter;
+import org.apache.parquet.format.converter.ParquetMetadataConverter;
+import org.apache.parquet.hadoop.ParquetFileReader;
+import org.apache.parquet.io.ColumnIOFactory;
+import org.apache.parquet.io.MessageColumnIO;
+import org.apache.parquet.io.RecordReader;
+import org.apache.parquet.schema.MessageType;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Parquet file accessor.
+ */
+public class ParquetFileAccessor extends Plugin implements ReadAccessor {
+private ParquetFileReader reader;
+private MessageColumnIO columnIO;
+private RecordIterator recordIterator;
+private MessageType schema;
+
+
+private class RecordIterator implements Iterator {
+
+private final ParquetFileReader reader;
+private PageReadStore currentRowGroup;
+private RecordReader recordReader;
+private long rowsRemainedInRowGroup;
+
+public RecordIterator(ParquetFileReader reader) {
+this.reader = reader;
+readNextRowGroup();
+}
+
+@Override
+public boolean hasNext() {
+return rowsRemainedInRowGroup > 0;
+}
+
+@Override
+public OneRow next() {
+return new OneRow(null, readNextGroup());
+}
+
+@Override
+public void remove() {
+throw new UnsupportedOperationException();
+}
+
+private void readNextRowGroup() {
+try {
+currentRowGroup = reader.readNextRowGroup();
+} catch (IOException e) {
+throw new RuntimeException("Error occurred during reading 
new row group", e);
+}
+if (currentRowGroup == null)
+return;
+rowsRemainedInRowGroup = currentRowGroup.getRowCount();
+recordReader = columnIO.getRecordReader(currentRowGroup, new 
GroupRecordConverter(schema));
+}
+
+private Group readNextGroup() {
+Group g = null;
+if (rowsRemainedInRowGroup == 0) {
+readNextRowGroup();
+if (currentRowGroup != null) {
+g = recordReader.read();
+}
+} else {
+g = recordReader.read();
+if (g == null) {
--- End diff --

Even though could looks slightly more complex - we are saving on invoking 
that method for every single record.


---


[GitHub] incubator-hawq pull request #1326: HAWQ-1575. Implemented readable Parquet p...

2018-02-05 Thread sansanichfb
Github user sansanichfb commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/1326#discussion_r166175619
  
--- Diff: 
pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/ParquetFileAccessor.java
 ---
@@ -0,0 +1,168 @@
+package org.apache.hawq.pxf.plugins.hdfs;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.ReadAccessor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.hawq.pxf.api.utilities.Plugin;
+import org.apache.hawq.pxf.plugins.hdfs.utilities.HdfsUtilities;
+
+import org.apache.parquet.column.page.PageReadStore;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.simple.convert.GroupRecordConverter;
+import org.apache.parquet.format.converter.ParquetMetadataConverter;
+import org.apache.parquet.hadoop.ParquetFileReader;
+import org.apache.parquet.io.ColumnIOFactory;
+import org.apache.parquet.io.MessageColumnIO;
+import org.apache.parquet.io.RecordReader;
+import org.apache.parquet.schema.MessageType;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Parquet file accessor.
--- End diff --

Added 


---


[jira] [Updated] (HAWQ-1575) Implement readable Parquet profile

2018-02-05 Thread Oleksandr Diachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Diachenko updated HAWQ-1575:
--
Fix Version/s: (was: 2.4.0.0-incubating)
   2.3.0.0-incubating

> Implement readable Parquet profile
> --
>
> Key: HAWQ-1575
> URL: https://issues.apache.org/jira/browse/HAWQ-1575
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Oleksandr Diachenko
>Assignee: Ed Espino
>Priority: Major
> Fix For: 2.3.0.0-incubating
>
>
> PXF should be able to read data from Parquet files stored in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1575) Implement readable Parquet profile

2018-02-05 Thread Oleksandr Diachenko (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353297#comment-16353297
 ] 

Oleksandr Diachenko commented on HAWQ-1575:
---

[~yjin] updated version.

 

> Implement readable Parquet profile
> --
>
> Key: HAWQ-1575
> URL: https://issues.apache.org/jira/browse/HAWQ-1575
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Oleksandr Diachenko
>Assignee: Ed Espino
>Priority: Major
> Fix For: 2.3.0.0-incubating
>
>
> PXF should be able to read data from Parquet files stored in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HAWQ-1575) Implement readable Parquet profile

2018-02-05 Thread Oleksandr Diachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Diachenko resolved HAWQ-1575.
---
Resolution: Fixed

> Implement readable Parquet profile
> --
>
> Key: HAWQ-1575
> URL: https://issues.apache.org/jira/browse/HAWQ-1575
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Oleksandr Diachenko
>Assignee: Ed Espino
>Priority: Major
> Fix For: 2.3.0.0-incubating
>
>
> PXF should be able to read data from Parquet files stored in HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] incubator-hawq pull request #1326: HAWQ-1575. Implemented readable Parquet p...

2018-02-05 Thread sansanichfb
Github user sansanichfb closed the pull request at:

https://github.com/apache/incubator-hawq/pull/1326


---


[GitHub] incubator-hawq issue #1287: HAWQ-1527 Enabled partition filtering for integr...

2018-02-05 Thread radarwave
Github user radarwave commented on the issue:

https://github.com/apache/incubator-hawq/pull/1287
  
@outofmem0ry  The PR is merged, please close this PR. Thanks.


---