[jira] [Updated] (HAWQ-961) Dispatch session user info (not current BOOTSTRAP_SUPERUSERID) on master to segments

2016-07-28 Thread Paul Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Guo updated HAWQ-961:
--
Summary: Dispatch session user info (not current BOOTSTRAP_SUPERUSERID) on 
master to segments  (was: Dispatch session user id (not current 
BOOTSTRAP_SUPERUSERID) on master to segments)

> Dispatch session user info (not current BOOTSTRAP_SUPERUSERID) on master to 
> segments
> 
>
> Key: HAWQ-961
> URL: https://issues.apache.org/jira/browse/HAWQ-961
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Paul Guo
>Assignee: Lei Chang
> Fix For: 2.0.1.0-incubating
>
>
> This does not affect the functionality and security of hawq, but there are 
> users who want the session user id info on segments to do something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (HAWQ-962) Make catalog:type_sanity be able to run with other cases in parallel

2016-07-28 Thread Paul Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Guo closed HAWQ-962.
-
Resolution: Fixed
  Assignee: Paul Guo  (was: Lei Chang)

> Make catalog:type_sanity be able to run with other cases in parallel
> 
>
> Key: HAWQ-962
> URL: https://issues.apache.org/jira/browse/HAWQ-962
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.0.1.0-incubating
>
>
> The test case will query some database-level system tables while with 
> parallel  google testing is being enabled (see  HAWQ-955. Add scriptS for 
> feature test running in parallel.), the test could fail. We need to create a 
> new database in the test case to avoid this..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (HAWQ-900) Add dependency in PL/R rpm build spec file plr.spec

2016-07-28 Thread Paul Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Guo closed HAWQ-900.
-
Resolution: Fixed
  Assignee: Paul Guo  (was: Lei Chang)

> Add dependency in PL/R rpm build spec file plr.spec
> ---
>
> Key: HAWQ-900
> URL: https://issues.apache.org/jira/browse/HAWQ-900
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.0.1.0-incubating
>
>
> Building of plr depends on R-devel, while using of plr depends on R. In 
> theory they depends on hawq also but we do not seem to be mandatory to have a 
> hawq rpm for hawq installation, so the dependencies could be R stuffs only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq pull request #824: HAWQ-962. Make catalog:type_sanity be able...

2016-07-28 Thread paul-guo-
Github user paul-guo- closed the pull request at:

https://github.com/apache/incubator-hawq/pull/824


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #778: HAWQ-900. Add dependency in PL/R rpm build...

2016-07-28 Thread paul-guo-
Github user paul-guo- closed the pull request at:

https://github.com/apache/incubator-hawq/pull/778


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #824: HAWQ-962. Make catalog:type_sanity be able to run...

2016-07-28 Thread ictmalili
Github user ictmalili commented on the issue:

https://github.com/apache/incubator-hawq/pull/824
  
LGTM. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #824: HAWQ-962. Make catalog:type_sanity be able to run...

2016-07-28 Thread wengyanqing
Github user wengyanqing commented on the issue:

https://github.com/apache/incubator-hawq/pull/824
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (HAWQ-958) LICENSE file missing checklist

2016-07-28 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-958:
---
Description: 
>From [~jmclean] IPMC release VOTE feedback

{quote}
- Please use the short form of the license linking to a license files in LICENSE
- BSD licensed code [3] 3. ./tools/bin/pythonSrc/unittest2-0.5.1/setup.py  
[~rlei] can you check this one as you're the last committer through HAWQ-837
- BSD license code [7] 7. ./depends/thirdparty/thrift/compiler/cpp/src/md5.? 
[~xunzhang] , this one was you through HAWQ-735
- license for this file [9] 9. ./src/backend/port/dynloader/ultrix4.h   - this 
one seems to be postgres license  plus a couple others.
- license for this file [10] Are we OK this was taken form GNU C? 10. 
./src/port/inet_aton.c 
- MIT license PSI [11] 11. ./tools/bin/pythonSrc/PSI-0.3b2_gp/
- BSD licensed code [12] 12. ./src/port/snprintf.c
- BSD licensed code [13] 13 ./src/port/crypt.c  Is this regard as cryptography 
code? [14] 14. http://www.apache.org/dev/crypto.html
- BSD licensed code [15][16] 15. ./src/port/memcmp.c , 16. 
./src/backend/utils/mb/wstrcmp.c
- license for this file [17] 17. ./src/port/rand.c
- license of these files [18][19] 18. ./src/backend/utils/adt/inet_net_ntop.c
19. ./src/backend/utils/adt/inet_net_pton.c
- license of this file [20] 20 ./src/port/strlcpy.c
- regex license [21] 21. ./src/backend/regex/COPYRIGHT
- How are these files licensed? [22] + others copyright AEG Automation GmbH 22. 
./src/backend/port/qnx4/shm.c
- How is this file licensed? [23] 23. ./src/backend/port/beos/shm.c
- BSD licensed libpq [24]. 24. ./src/backend/libpq/sha2.?
Is this considered crypto code and may need an export license?
- pgdump [25] 25. ./src/bin/pg_dump/
- license for this file [26] 26. ./src/port/gettimeofday.c
- license for this file [27] Look like an ASF header may of been incorrectly 
added to this. 27. 
./depends/thirdparty/thrift/lib/cpp/src/thrift/windows/SocketPair.cpp
- This BSD licensed file [36] 36. ./src/bin/pg_controldata/pg_controldata.c
- license for these files [37][38] and others in [39]
37. ./depends/thirdparty/thrift/aclocal/ax_cxx_compile_stdcxx_11.m4
38. ./depends/thirdparty/thrift/aclocal/ax_boost_base.m4
39. ./depends/thirdparty/thrift/aclocal
- This BSD licensed file [40]
40. ./depends/thirdparty/thrift/build/cmake/FindGLIB.cmake
- This BSD licensed file [41]
41. ./tools/bin/pythonSrc/unittest2-0.5.1/setup.py
- BSD licensed pychecker [42]
42. ./tools/bin/pythonSrc/pychecker-0.8.18/
- licenses for all of these files [43]
43. ./src/interfaces/libpq/po/*.po
- BSD license pg800 [44]
44. ./tools/bin/ext/pg8000/*
- how is this file licensed? [45]
45. ./src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
- license for this file [47] 47 
./tools/bin/pythonSrc/lockfile-0.9.1/lockfile/pidlockfile.py
- Python license for this file [48]. Is this an Apache comparable license? 48 
./tools/bin/pythonSrc/pychecker-0.8.18/pychecker2/symbols.py
- How are these files licensed? [49] Note multiple copyright owners and missing 
headers.
49.  ./src/backend/utils/mb/Unicode/*
- BSD licensed fig leaf. [50] Note that files incorrectly has had ASF headers 
applied.
50. ./tools/bin/ext/figleaf/*
- This BSD licensed file [51]
51. ./depends/thirdparty/thrift/lib/py/compat/win32/stdint.h
- This public domain style sheet [52]
52. ./tools/bin/pythonSrc/PyGreSQL-4.0/docs/default.css
- This file [53]
53. ./src/test/locale/test-ctype.c
- License for unit test2 [54]
54 ./tools/bin/pythonSrc/unittest2-0.5.1/unittest2/
- MIT licensed lock file [55]
55. ./tools/bin/pythonSrc/lockfile-0.9.1/LICENSE
- JSON code here [56]
56. ./src/include/catalog/JSON
- License for this file [57]
57. ./src/pl/plperl/ppport.h

Looks like GPL/LPGL licensed code may be included [4][5][6] in the release.
4. ./depends/thirdparty/thrift/debian/copyright (end of file)
5. ./depends/thirdparty/thrift/doc/licenses/lgpl-2.1.txt
6. ./tools/bin/gppylib/operations/test/test_package.py

This file [8] and others(?) may incorrectly have an ASF headers on it. Also why 
does this file have an ASF header with copyright line? [46]
8. ./tools/sbin/hawqstandbywatch.py
46. 
./contrib/hawq-hadoop/hawq-mapreduce-tool/src/test/resources/log4j.properties

Code includes code licensed under the 4 clause BSD license which is not 
compatible with the Apache 2.0 license. [28][29][30][31][32][33] It may be that 
this clause has been rescinded [35] and it is OK to include but that needs to 
be checked.
28. ./src/backend/port/dynloader/freebsd.c
29. ./src/backend/port/dynloader/netbsd.c
30. ./src/backend/port/dynloader/openbsd.c
31. ./src/bin/gpfdist/src/gpfdist/glob.c
32. ./src/bin/gpfdist/src/gpfdist/include/glob.h
33. ./src/include/port/win32_msvc/glob.h
34. ./src/port/glob.c -- [Goden] was not in original Justin's feedback but 
given the context, I think it's in the same comment for [28]-[33] and [35] 
35. ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.

[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread shivzone
Github user shivzone commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72709587
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCAccessor.java
 ---
@@ -0,0 +1,170 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.InputSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hawq.pxf.api.FilterParser;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.commons.lang.StringUtils;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static 
org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter.PXF_HIVE_SERDES;
+
+/**
+ * Specialization of HiveAccessor for a Hive table that stores only ORC 
files.
+ * This class replaces the generic HiveAccessor for a case where a table 
is stored entirely as ORC files.
+ * Use together with {@link HiveInputFormatFragmenter}/{@link 
HiveColumnarSerdeResolver}
+ */
+public class HiveORCAccessor extends HiveAccessor {
+
+private RecordReader batchReader = null;
+private Reader reader = null;
+private VectorizedRowBatch batch = null;
+
+private final String READ_COLUMN_IDS_CONF_STR = 
"hive.io.file.readcolumn.ids";
+private final String READ_ALL_COLUMNS = 
"hive.io.file.read.all.columns";
+private final String READ_COLUMN_NAMES_CONF_STR = 
"hive.io.file.readcolumn.names";
+private final String SARG_PUSHDOWN = "sarg.pushdown";
+
+/**
+ * Constructs a HiveRCFileAccessor.
+ *
+ * @param input input containing user data
+ * @throws Exception if user data was wrong
+ */
+public HiveORCAccessor(InputData input) throws Exception {
+super(input, new OrcInputFormat());
+String[] toks = HiveInputFormatFragmenter.parseToks(input, 
PXF_HIVE_SERDES.COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.LAZY_BINARY_COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.ORC_SERDE.name(), PXF_HIVE_SERDES.VECTORIZED_ORC_SERDE.name());
+initPartitionFields(toks[HiveInputFormatFragmenter.TOK_KEYS]);
+filterInFragmenter = new 
Boolean(toks[HiveInputFormatFragmenter.TOK_FILTER_DONE]);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+addColumns();
+addFilters();
+return super.openForRead();
+}
+
+@Override
+protected Object getReader(JobConf jobConf, InputSplit split)
+throws IOException {
+return inputFormat.getRecordReader(split, jobConf, Reporter.NULL);
+}
+
+/**
+ * Adds the table tuple description to JobConf ojbect
+ * so only these columns will be returned.
+ */
+private void addColumns() throws Exception {
+
+List colIds = new ArrayList();
+List colNames = new ArrayList();
+for(ColumnDescriptor col: inputData.getTupleDescription()) {
+if(col.isProjected()) {
+colIds.add(String.valueOf(col.columnIndex()));
--- End diff --

Will update


---
If your project is set up for it, you can reply to this email and have yo

[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread shivzone
Github user shivzone commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72708290
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCAccessor.java
 ---
@@ -0,0 +1,170 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.InputSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hawq.pxf.api.FilterParser;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.commons.lang.StringUtils;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static 
org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter.PXF_HIVE_SERDES;
+
+/**
+ * Specialization of HiveAccessor for a Hive table that stores only ORC 
files.
+ * This class replaces the generic HiveAccessor for a case where a table 
is stored entirely as ORC files.
+ * Use together with {@link HiveInputFormatFragmenter}/{@link 
HiveColumnarSerdeResolver}
+ */
+public class HiveORCAccessor extends HiveAccessor {
+
+private RecordReader batchReader = null;
+private Reader reader = null;
+private VectorizedRowBatch batch = null;
+
+private final String READ_COLUMN_IDS_CONF_STR = 
"hive.io.file.readcolumn.ids";
+private final String READ_ALL_COLUMNS = 
"hive.io.file.read.all.columns";
+private final String READ_COLUMN_NAMES_CONF_STR = 
"hive.io.file.readcolumn.names";
+private final String SARG_PUSHDOWN = "sarg.pushdown";
+
+/**
+ * Constructs a HiveRCFileAccessor.
+ *
+ * @param input input containing user data
+ * @throws Exception if user data was wrong
+ */
+public HiveORCAccessor(InputData input) throws Exception {
+super(input, new OrcInputFormat());
+String[] toks = HiveInputFormatFragmenter.parseToks(input, 
PXF_HIVE_SERDES.COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.LAZY_BINARY_COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.ORC_SERDE.name(), PXF_HIVE_SERDES.VECTORIZED_ORC_SERDE.name());
+initPartitionFields(toks[HiveInputFormatFragmenter.TOK_KEYS]);
+filterInFragmenter = new 
Boolean(toks[HiveInputFormatFragmenter.TOK_FILTER_DONE]);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+addColumns();
+addFilters();
+return super.openForRead();
+}
+
+@Override
+protected Object getReader(JobConf jobConf, InputSplit split)
+throws IOException {
+return inputFormat.getRecordReader(split, jobConf, Reporter.NULL);
+}
+
+/**
+ * Adds the table tuple description to JobConf ojbect
+ * so only these columns will be returned.
+ */
+private void addColumns() throws Exception {
+
+List colIds = new ArrayList();
+List colNames = new ArrayList();
+for(ColumnDescriptor col: inputData.getTupleDescription()) {
+if(col.isProjected()) {
+colIds.add(String.valueOf(col.columnIndex()));
+colNames.add(col.columnName());
+}
+}
+jobConf.set(READ_ALL

[GitHub] incubator-hawq issue #812: HAWQ-949. Revert serializing floats in pxf string...

2016-07-28 Thread kavinderd
Github user kavinderd commented on the issue:

https://github.com/apache/incubator-hawq/pull/812
  
Already reverted


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #817: HAWQ-954. Check that ExternalSelectDesc reference...

2016-07-28 Thread kavinderd
Github user kavinderd commented on the issue:

https://github.com/apache/incubator-hawq/pull/817
  
Merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #812: HAWQ-949. Revert serializing floats in pxf...

2016-07-28 Thread kavinderd
Github user kavinderd closed the pull request at:

https://github.com/apache/incubator-hawq/pull/812


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #817: HAWQ-954. Check that ExternalSelectDesc re...

2016-07-28 Thread kavinderd
Github user kavinderd closed the pull request at:

https://github.com/apache/incubator-hawq/pull/817


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread shivzone
Github user shivzone commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72691158
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCAccessor.java
 ---
@@ -0,0 +1,170 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.InputSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hawq.pxf.api.FilterParser;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.commons.lang.StringUtils;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static 
org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter.PXF_HIVE_SERDES;
+
+/**
+ * Specialization of HiveAccessor for a Hive table that stores only ORC 
files.
+ * This class replaces the generic HiveAccessor for a case where a table 
is stored entirely as ORC files.
+ * Use together with {@link HiveInputFormatFragmenter}/{@link 
HiveColumnarSerdeResolver}
+ */
+public class HiveORCAccessor extends HiveAccessor {
+
+private RecordReader batchReader = null;
+private Reader reader = null;
+private VectorizedRowBatch batch = null;
+
+private final String READ_COLUMN_IDS_CONF_STR = 
"hive.io.file.readcolumn.ids";
+private final String READ_ALL_COLUMNS = 
"hive.io.file.read.all.columns";
+private final String READ_COLUMN_NAMES_CONF_STR = 
"hive.io.file.readcolumn.names";
+private final String SARG_PUSHDOWN = "sarg.pushdown";
+
+/**
+ * Constructs a HiveRCFileAccessor.
+ *
+ * @param input input containing user data
+ * @throws Exception if user data was wrong
+ */
+public HiveORCAccessor(InputData input) throws Exception {
+super(input, new OrcInputFormat());
+String[] toks = HiveInputFormatFragmenter.parseToks(input, 
PXF_HIVE_SERDES.COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.LAZY_BINARY_COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.ORC_SERDE.name(), PXF_HIVE_SERDES.VECTORIZED_ORC_SERDE.name());
+initPartitionFields(toks[HiveInputFormatFragmenter.TOK_KEYS]);
+filterInFragmenter = new 
Boolean(toks[HiveInputFormatFragmenter.TOK_FILTER_DONE]);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+addColumns();
+addFilters();
+return super.openForRead();
+}
+
+@Override
+protected Object getReader(JobConf jobConf, InputSplit split)
+throws IOException {
+return inputFormat.getRecordReader(split, jobConf, Reporter.NULL);
--- End diff --

don't quite needs this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread shivzone
Github user shivzone commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72687780
  
--- Diff: pxf/gradle.properties ---
@@ -23,4 +23,5 @@ hiveVersion=1.2.1
 hbaseVersionJar=1.1.2
 hbaseVersionRPM=1.1.2
 tomcatVersion=7.0.62
-pxfProtocolVersion=v14
\ No newline at end of file
+pxfProtocolVersion=v14
+orcVersion=1.1.1
--- End diff --

Good point. will remove this dependancy


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HAWQ-583) Extend PXF to allow plugins to support returning partial content of SELECT(column projection)

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398044#comment-15398044
 ] 

Shivram Mani commented on HAWQ-583:
---

Refer to HAWQ-927 for the http header params introduced to support column 
projection.
If the table has 3 columns col1, col2, col3 the following table describes the 
PXF response corresponding to various types of query patterns

||Query||X-GP-ATTRS-PROJ||X-GP-ATTRS-PROJ-IDX||PXF Action||
| select col1 |1|0|Project col1 data with NULL values padded in col2 and col3|
| select col1,col2 |2|0, 1|Project col1 and col2 data with NULL values in col3|
| select count(col2) |1|1|Project col2 with NULL values in col1 and col3|
| select count ( * ) |0|NULL|Project only one column. We will pick col1 for 
consistency purpose|
| select * |NULL|NULL|Project all columns|
| select col1,* |4|0,0,1,2|Project col1,col2 and col3|

> Extend PXF to allow plugins to support returning partial content of 
> SELECT(column projection)
> -
>
> Key: HAWQ-583
> URL: https://issues.apache.org/jira/browse/HAWQ-583
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Michael Andre Pearce (IG)
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> Currently PXF supports being able to push down the predicate WHERE logic to 
> the external system to reduce the amount data needed to be retrieved.
> SELECT a, b FROM external_pxf_source WHERE z < 3 AND x > 6
> As such we can filter the rows returned, but currently still would have to 
> return all the fields / complete row.
> This proposal is so that we can return only the columns in SELECT part.
> For data sources where it is columnar storage or selectable such as remote 
> database that PXF can read or connect to this has advantages in the data that 
> needs to be accessed or even transferred.
> As like with the push down Filter it should be optional so that plugins that 
> provide support can use it but others that do not, continue to work as they 
> do.
> The proposal would be for
> 1) create an interface created for plugins to optionally implement, where the 
> columns needed to be returned are given to the plugin.
> 2) update pxf api for hawq to send columns defined in SELECT, for pxf to 
> invoke the plugin interface and pass this information onto if provided
> 3) update pxf integration within hawq itself so that hawq passes this 
> additonal  information to pxf.
> This Ticket is off the back of discussion on HAWQ-492.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HAWQ-583) Extend PXF to allow plugins to support returning partial content of SELECT(column projection)

2016-07-28 Thread Shivram Mani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivram Mani updated HAWQ-583:
--
Comment: was deleted

(was: Refer to HAWQ-927 for the http header params introduced to support column 
projection.
If the table has 3 columns col1, col2, col3 the following table describes the 
PXF response corresponding to various types of query patterns

||Query||X-GP-ATTRS-PROJ||X-GP-ATTRS-PROJ-IDX||PXF Action||
| select col1 |1|0|Project col1 data with NULL values padded in col2 and col3|
| select col1,col2 |2|0, 1|Project col1 and col2 data with NULL values in col3|
| select count(col2) |1|1|Project col2 with NULL values in col1 and col3|
| select count ( * ) |0|NULL|Project only one column. We will pick col1 for 
consistency purpose|
| select * |NULL|NULL|Project all columns|
| select col1,* |4|0,0,1,2|Project col1,col2 and col3|)

> Extend PXF to allow plugins to support returning partial content of 
> SELECT(column projection)
> -
>
> Key: HAWQ-583
> URL: https://issues.apache.org/jira/browse/HAWQ-583
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Michael Andre Pearce (IG)
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> Currently PXF supports being able to push down the predicate WHERE logic to 
> the external system to reduce the amount data needed to be retrieved.
> SELECT a, b FROM external_pxf_source WHERE z < 3 AND x > 6
> As such we can filter the rows returned, but currently still would have to 
> return all the fields / complete row.
> This proposal is so that we can return only the columns in SELECT part.
> For data sources where it is columnar storage or selectable such as remote 
> database that PXF can read or connect to this has advantages in the data that 
> needs to be accessed or even transferred.
> As like with the push down Filter it should be optional so that plugins that 
> provide support can use it but others that do not, continue to work as they 
> do.
> The proposal would be for
> 1) create an interface created for plugins to optionally implement, where the 
> columns needed to be returned are given to the plugin.
> 2) update pxf api for hawq to send columns defined in SELECT, for pxf to 
> invoke the plugin interface and pass this information onto if provided
> 3) update pxf integration within hawq itself so that hawq passes this 
> additonal  information to pxf.
> This Ticket is off the back of discussion on HAWQ-492.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAWQ-949) Hawq sending unsupported serialized float filter data to PXF

2016-07-28 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao resolved HAWQ-949.

Resolution: Invalid

HAWQ-779 reverted this is no longer valid.

> Hawq sending unsupported serialized float filter data to PXF
> 
>
> Key: HAWQ-949
> URL: https://issues.apache.org/jira/browse/HAWQ-949
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: External Tables, PXF
>Reporter: Kavinder Dhaliwal
>Assignee: Kavinder Dhaliwal
> Fix For: 2.0.1.0-incubating
>
>
> HAWQ-779 introduced support in the C side for Float to be serialized into the 
> filter header sent to PXF. However, changes were not made to the FilterParser 
> class in PXF to support parsing non-Int numeric types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-583) Extend PXF to allow plugins to support returning partial content of SELECT(column projection)

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398007#comment-15398007
 ] 

Shivram Mani edited comment on HAWQ-583 at 7/28/16 7:10 PM:


Refer to HAWQ-927 for the http header params introduced to support column 
projection.
If the table has 3 columns col1, col2, col3 the following table describes the 
PXF response corresponding to various types of query patterns

||Query||X-GP-ATTRS-PROJ||X-GP-ATTRS-PROJ-IDX||PXF Action||
| select col1 |1|0|Project col1 data with NULL values padded in col2 and col3|
| select col1,col2 |2|0, 1|Project col1 and col2 data with NULL values in col3|
| select count(col2) |1|1|Project col2 with NULL values in col1 and col3|
| select count ( * ) |0|NULL|Project only one column. We will pick col1 for 
consistency purpose|
| select * |NULL|NULL|Project all columns|
| select col1,* |4|0,0,1,2|Project col1,col2 and col3|


was (Author: shivram):
Refer to HAWQ-927 for the http header params introduced to support column 
projection.
If the table has 3 columns col1, col2, col3 the following table describes the 
PXF response corresponding to various types of query patterns

||Query||X-GP-ATTRS-PROJ||X-GP-ATTRS-PROJ-IDX||PXF Action||
| select col1 |1|0|Project col1 data with NULL values padded in col2 and col3|
| select col1,col2 |2|0, 1|Project col1 and col2 data with NULL values in col3|
| select count(col2) |1|1|Project col2 with NULL values in col1 and col3|
| select count(*) |0|NULL|Project only one column. We will pick col1 for 
consistency purpose|
| select * |NULL|NULL|Project all columns|
| select col1,* |4|0,0,1,2|Project col1,col2 and col3|

> Extend PXF to allow plugins to support returning partial content of 
> SELECT(column projection)
> -
>
> Key: HAWQ-583
> URL: https://issues.apache.org/jira/browse/HAWQ-583
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Michael Andre Pearce (IG)
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> Currently PXF supports being able to push down the predicate WHERE logic to 
> the external system to reduce the amount data needed to be retrieved.
> SELECT a, b FROM external_pxf_source WHERE z < 3 AND x > 6
> As such we can filter the rows returned, but currently still would have to 
> return all the fields / complete row.
> This proposal is so that we can return only the columns in SELECT part.
> For data sources where it is columnar storage or selectable such as remote 
> database that PXF can read or connect to this has advantages in the data that 
> needs to be accessed or even transferred.
> As like with the push down Filter it should be optional so that plugins that 
> provide support can use it but others that do not, continue to work as they 
> do.
> The proposal would be for
> 1) create an interface created for plugins to optionally implement, where the 
> columns needed to be returned are given to the plugin.
> 2) update pxf api for hawq to send columns defined in SELECT, for pxf to 
> invoke the plugin interface and pass this information onto if provided
> 3) update pxf integration within hawq itself so that hawq passes this 
> additonal  information to pxf.
> This Ticket is off the back of discussion on HAWQ-492.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq issue #808: HAWQ-944. Implement new pg_ltoa function as per p...

2016-07-28 Thread kavinderd
Github user kavinderd commented on the issue:

https://github.com/apache/incubator-hawq/pull/808
  
Added `INT32_CHAR_SIZE` to `configure.in`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HAWQ-583) Extend PXF to allow plugins to support returning partial content of SELECT(column projection)

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398007#comment-15398007
 ] 

Shivram Mani commented on HAWQ-583:
---

Refer to HAWQ-927 for the http header params introduced to support column 
projection.
If the table has 3 columns col1, col2, col3 the following table describes the 
PXF response corresponding to various types of query patterns

||Query||X-GP-ATTRS-PROJ||X-GP-ATTRS-PROJ-IDX||PXF Action||
| select col1 |1|0|Project col1 data with NULL values padded in col2 and col3|
| select col1,col2 |2|0, 1|Project col1 and col2 data with NULL values in col3|
| select count(col2) |1|1|Project col2 with NULL values in col1 and col3|
| select count(*) |0|NULL|Project only one column. We will pick col1 for 
consistency purpose|
| select * |NULL|NULL|Project all columns|
| select col1,* |4|0,0,1,2|Project col1,col2 and col3|

> Extend PXF to allow plugins to support returning partial content of 
> SELECT(column projection)
> -
>
> Key: HAWQ-583
> URL: https://issues.apache.org/jira/browse/HAWQ-583
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Michael Andre Pearce (IG)
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> Currently PXF supports being able to push down the predicate WHERE logic to 
> the external system to reduce the amount data needed to be retrieved.
> SELECT a, b FROM external_pxf_source WHERE z < 3 AND x > 6
> As such we can filter the rows returned, but currently still would have to 
> return all the fields / complete row.
> This proposal is so that we can return only the columns in SELECT part.
> For data sources where it is columnar storage or selectable such as remote 
> database that PXF can read or connect to this has advantages in the data that 
> needs to be accessed or even transferred.
> As like with the push down Filter it should be optional so that plugins that 
> provide support can use it but others that do not, continue to work as they 
> do.
> The proposal would be for
> 1) create an interface created for plugins to optionally implement, where the 
> columns needed to be returned are given to the plugin.
> 2) update pxf api for hawq to send columns defined in SELECT, for pxf to 
> invoke the plugin interface and pass this information onto if provided
> 3) update pxf integration within hawq itself so that hawq passes this 
> additonal  information to pxf.
> This Ticket is off the back of discussion on HAWQ-492.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq pull request #820: HAWQ-953 hawq pxf-hive support partition c...

2016-07-28 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/820#discussion_r72680191
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java
 ---
@@ -470,3 +476,4 @@ public FragmentsStats getFragmentsStats() throws 
Exception {
 "ANALYZE for Hive plugin is not supported");
 }
 }
+
--- End diff --

Remove whitespace


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #820: HAWQ-953 hawq pxf-hive support partition c...

2016-07-28 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/820#discussion_r72680060
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java
 ---
@@ -331,7 +331,7 @@ String serializePartitionKeys(HiveTablePartition 
partData) throws Exception {
 if (partData.partition == null) /*
  * this is a simple hive table - 
there
  * are no partitions
- */{
+ */ {
--- End diff --

Whitespace, please remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #820: HAWQ-953 hawq pxf-hive support partition c...

2016-07-28 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/820#discussion_r72679690
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java
 ---
@@ -20,11 +20,7 @@
  */
 
 import java.io.ByteArrayOutputStream;
-import java.util.List;
-import java.util.ListIterator;
-import java.util.Properties;
-import java.util.Set;
-import java.util.TreeSet;
+import java.util.*;
--- End diff --

Can you revert back to importing each individual library. This imports an 
unnecessary amount of libraries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread kavinderd
Github user kavinderd commented on the issue:

https://github.com/apache/incubator-hawq/pull/821
  
LGTM, nice job


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72679394
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCAccessor.java
 ---
@@ -0,0 +1,170 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.InputSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hawq.pxf.api.FilterParser;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.commons.lang.StringUtils;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static 
org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter.PXF_HIVE_SERDES;
+
+/**
+ * Specialization of HiveAccessor for a Hive table that stores only ORC 
files.
+ * This class replaces the generic HiveAccessor for a case where a table 
is stored entirely as ORC files.
+ * Use together with {@link HiveInputFormatFragmenter}/{@link 
HiveColumnarSerdeResolver}
+ */
+public class HiveORCAccessor extends HiveAccessor {
+
+private RecordReader batchReader = null;
+private Reader reader = null;
+private VectorizedRowBatch batch = null;
+
+private final String READ_COLUMN_IDS_CONF_STR = 
"hive.io.file.readcolumn.ids";
+private final String READ_ALL_COLUMNS = 
"hive.io.file.read.all.columns";
+private final String READ_COLUMN_NAMES_CONF_STR = 
"hive.io.file.readcolumn.names";
+private final String SARG_PUSHDOWN = "sarg.pushdown";
+
+/**
+ * Constructs a HiveRCFileAccessor.
+ *
+ * @param input input containing user data
+ * @throws Exception if user data was wrong
+ */
+public HiveORCAccessor(InputData input) throws Exception {
+super(input, new OrcInputFormat());
+String[] toks = HiveInputFormatFragmenter.parseToks(input, 
PXF_HIVE_SERDES.COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.LAZY_BINARY_COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.ORC_SERDE.name(), PXF_HIVE_SERDES.VECTORIZED_ORC_SERDE.name());
+initPartitionFields(toks[HiveInputFormatFragmenter.TOK_KEYS]);
+filterInFragmenter = new 
Boolean(toks[HiveInputFormatFragmenter.TOK_FILTER_DONE]);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+addColumns();
+addFilters();
+return super.openForRead();
+}
+
+@Override
+protected Object getReader(JobConf jobConf, InputSplit split)
+throws IOException {
+return inputFormat.getRecordReader(split, jobConf, Reporter.NULL);
+}
+
+/**
+ * Adds the table tuple description to JobConf ojbect
+ * so only these columns will be returned.
+ */
+private void addColumns() throws Exception {
+
+List colIds = new ArrayList();
+List colNames = new ArrayList();
+for(ColumnDescriptor col: inputData.getTupleDescription()) {
+if(col.isProjected()) {
+colIds.add(String.valueOf(col.columnIndex()));
--- End diff --

Do we need to convert Integer to String here? I think `StringUtils.join` 
can take a `List

[GitHub] incubator-hawq issue #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread kavinderd
Github user kavinderd commented on the issue:

https://github.com/apache/incubator-hawq/pull/821
  
@shivzone Can you explain why we need a separate Resolver for ORC?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (HAWQ-927) Send Projection Info Data from HAWQ to PXF

2016-07-28 Thread Shivram Mani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivram Mani updated HAWQ-927:
--
Description: 
To achieve column projection at the level of PXF or the underlying readers we 
need to first send this data as a Header/Param to PXF. Currently, PXF has no 
knowledge whether a query requires all columns or a subset of columns.
Proposal is to send the following two attributes as part of the HTTP header in 
the Rest Api from HAWQ to PXF
X-GP-ATTRS-PROJ - Indicates the number of attributes/columns to be projected
X-GP-ATTRS-PROJ-IDX - Indicates the Column index(s) to be projected

  was:To achieve column projection at the level of PXF or the underlying 
readers we need to first send this data as a Header/Param to PXF. Currently, 
PXF has no knowledge whether a query requires all columns or a subset of 
columns.


> Send Projection Info Data from HAWQ to PXF
> --
>
> Key: HAWQ-927
> URL: https://issues.apache.org/jira/browse/HAWQ-927
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: External Tables, PXF
>Reporter: Kavinder Dhaliwal
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> To achieve column projection at the level of PXF or the underlying readers we 
> need to first send this data as a Header/Param to PXF. Currently, PXF has no 
> knowledge whether a query requires all columns or a subset of columns.
> Proposal is to send the following two attributes as part of the HTTP header 
> in the Rest Api from HAWQ to PXF
> X-GP-ATTRS-PROJ - Indicates the number of attributes/columns to be projected
> X-GP-ATTRS-PROJ-IDX - Indicates the Column index(s) to be projected



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HAWQ-927) Send Projection Info Data from HAWQ to PXF

2016-07-28 Thread Shivram Mani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivram Mani updated HAWQ-927:
--
Comment: was deleted

(was: {TABLE})

> Send Projection Info Data from HAWQ to PXF
> --
>
> Key: HAWQ-927
> URL: https://issues.apache.org/jira/browse/HAWQ-927
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: External Tables, PXF
>Reporter: Kavinder Dhaliwal
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> To achieve column projection at the level of PXF or the underlying readers we 
> need to first send this data as a Header/Param to PXF. Currently, PXF has no 
> knowledge whether a query requires all columns or a subset of columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-927) Send Projection Info Data from HAWQ to PXF

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397966#comment-15397966
 ] 

Shivram Mani commented on HAWQ-927:
---

{TABLE}

> Send Projection Info Data from HAWQ to PXF
> --
>
> Key: HAWQ-927
> URL: https://issues.apache.org/jira/browse/HAWQ-927
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: External Tables, PXF
>Reporter: Kavinder Dhaliwal
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> To achieve column projection at the level of PXF or the underlying readers we 
> need to first send this data as a Header/Param to PXF. Currently, PXF has no 
> knowledge whether a query requires all columns or a subset of columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72675808
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCAccessor.java
 ---
@@ -0,0 +1,170 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.InputSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hawq.pxf.api.FilterParser;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.commons.lang.StringUtils;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static 
org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter.PXF_HIVE_SERDES;
+
+/**
+ * Specialization of HiveAccessor for a Hive table that stores only ORC 
files.
+ * This class replaces the generic HiveAccessor for a case where a table 
is stored entirely as ORC files.
+ * Use together with {@link HiveInputFormatFragmenter}/{@link 
HiveColumnarSerdeResolver}
+ */
+public class HiveORCAccessor extends HiveAccessor {
+
+private RecordReader batchReader = null;
+private Reader reader = null;
+private VectorizedRowBatch batch = null;
+
+private final String READ_COLUMN_IDS_CONF_STR = 
"hive.io.file.readcolumn.ids";
+private final String READ_ALL_COLUMNS = 
"hive.io.file.read.all.columns";
+private final String READ_COLUMN_NAMES_CONF_STR = 
"hive.io.file.readcolumn.names";
+private final String SARG_PUSHDOWN = "sarg.pushdown";
+
+/**
+ * Constructs a HiveRCFileAccessor.
+ *
+ * @param input input containing user data
+ * @throws Exception if user data was wrong
+ */
+public HiveORCAccessor(InputData input) throws Exception {
+super(input, new OrcInputFormat());
+String[] toks = HiveInputFormatFragmenter.parseToks(input, 
PXF_HIVE_SERDES.COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.LAZY_BINARY_COLUMNAR_SERDE.name(), 
PXF_HIVE_SERDES.ORC_SERDE.name(), PXF_HIVE_SERDES.VECTORIZED_ORC_SERDE.name());
+initPartitionFields(toks[HiveInputFormatFragmenter.TOK_KEYS]);
+filterInFragmenter = new 
Boolean(toks[HiveInputFormatFragmenter.TOK_FILTER_DONE]);
+}
+
+@Override
+public boolean openForRead() throws Exception {
+addColumns();
+addFilters();
+return super.openForRead();
+}
+
+@Override
+protected Object getReader(JobConf jobConf, InputSplit split)
+throws IOException {
+return inputFormat.getRecordReader(split, jobConf, Reporter.NULL);
+}
+
+/**
+ * Adds the table tuple description to JobConf ojbect
+ * so only these columns will be returned.
+ */
+private void addColumns() throws Exception {
+
+List colIds = new ArrayList();
+List colNames = new ArrayList();
+for(ColumnDescriptor col: inputData.getTupleDescription()) {
+if(col.isProjected()) {
+colIds.add(String.valueOf(col.columnIndex()));
+colNames.add(col.columnName());
+}
+}
+jobConf.set(READ_AL

[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread GodenYao
Github user GodenYao commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72674878
  
--- Diff: pxf/gradle.properties ---
@@ -23,4 +23,5 @@ hiveVersion=1.2.1
 hbaseVersionJar=1.1.2
 hbaseVersionRPM=1.1.2
 tomcatVersion=7.0.62
-pxfProtocolVersion=v14
\ No newline at end of file
+pxfProtocolVersion=v14
+orcVersion=1.1.1
--- End diff --

if we still use hive apis, why do we need orcVersion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72674048
  
--- Diff: 
pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCAccessor.java
 ---
@@ -0,0 +1,170 @@
+package org.apache.hawq.pxf.plugins.hive;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
+import org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory;
+import org.apache.hadoop.mapred.FileSplit;
+import org.apache.hadoop.mapred.InputSplit;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hawq.pxf.api.FilterParser;
+import org.apache.hawq.pxf.api.OneRow;
+import org.apache.hawq.pxf.api.utilities.ColumnDescriptor;
+import org.apache.hawq.pxf.api.utilities.InputData;
+import org.apache.orc.Reader;
+import org.apache.orc.RecordReader;
+import org.apache.orc.TypeDescription;
+import org.apache.commons.lang.StringUtils;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static 
org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter.PXF_HIVE_SERDES;
+
+/**
+ * Specialization of HiveAccessor for a Hive table that stores only ORC 
files.
+ * This class replaces the generic HiveAccessor for a case where a table 
is stored entirely as ORC files.
+ * Use together with {@link HiveInputFormatFragmenter}/{@link 
HiveColumnarSerdeResolver}
+ */
+public class HiveORCAccessor extends HiveAccessor {
+
+private RecordReader batchReader = null;
+private Reader reader = null;
+private VectorizedRowBatch batch = null;
+
+private final String READ_COLUMN_IDS_CONF_STR = 
"hive.io.file.readcolumn.ids";
+private final String READ_ALL_COLUMNS = 
"hive.io.file.read.all.columns";
+private final String READ_COLUMN_NAMES_CONF_STR = 
"hive.io.file.readcolumn.names";
+private final String SARG_PUSHDOWN = "sarg.pushdown";
+
+/**
+ * Constructs a HiveRCFileAccessor.
--- End diff --

Spelling `HiveORCAcessor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #821: HAWQ-931. ORC optimized profile for PPD/CP

2016-07-28 Thread kavinderd
Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/821#discussion_r72673496
  
--- Diff: 
pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/ColumnDescriptor.java
 ---
@@ -89,11 +106,20 @@ public boolean isKeyColumn() {
 return RECORD_KEY_NAME.equalsIgnoreCase(gpdbColumnName);
 }
 
+public boolean isProjected() {
+return isProjected;
+}
+
+public void setProjected(boolean projected) {
+isProjected = projected;
+}
+
 @Override
public String toString() {
return "ColumnDescriptor [gpdbColumnTypeCode=" + 
gpdbColumnTypeCode
+ ", gpdbColumnName=" + gpdbColumnName
+ ", gpdbColumnTypeName=" + gpdbColumnTypeName
-   + ", gpdbColumnIndex=" + gpdbColumnIndex + "]";
+   + ", gpdbColumnIndex=" + gpdbColumnIndex
++ ", isProjected=" + isProjected + "]";
--- End diff --

Indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HAWQ-583) Extend PXF to allow plugins to support returning partial content of SELECT(column projection)

2016-07-28 Thread Kavinder Dhaliwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397931#comment-15397931
 ] 

Kavinder Dhaliwal commented on HAWQ-583:


This is being implemented for ORC files via HAWQ-886

> Extend PXF to allow plugins to support returning partial content of 
> SELECT(column projection)
> -
>
> Key: HAWQ-583
> URL: https://issues.apache.org/jira/browse/HAWQ-583
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Michael Andre Pearce (IG)
>Assignee: Kavinder Dhaliwal
> Fix For: backlog
>
>
> Currently PXF supports being able to push down the predicate WHERE logic to 
> the external system to reduce the amount data needed to be retrieved.
> SELECT a, b FROM external_pxf_source WHERE z < 3 AND x > 6
> As such we can filter the rows returned, but currently still would have to 
> return all the fields / complete row.
> This proposal is so that we can return only the columns in SELECT part.
> For data sources where it is columnar storage or selectable such as remote 
> database that PXF can read or connect to this has advantages in the data that 
> needs to be accessed or even transferred.
> As like with the push down Filter it should be optional so that plugins that 
> provide support can use it but others that do not, continue to work as they 
> do.
> The proposal would be for
> 1) create an interface created for plugins to optionally implement, where the 
> columns needed to be returned are given to the plugin.
> 2) update pxf api for hawq to send columns defined in SELECT, for pxf to 
> invoke the plugin interface and pass this information onto if provided
> 3) update pxf integration within hawq itself so that hawq passes this 
> additonal  information to pxf.
> This Ticket is off the back of discussion on HAWQ-492.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-965) Error reporting is misleading with incorrect table/file in PXF location in HA cluster

2016-07-28 Thread Shivram Mani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivram Mani updated HAWQ-965:
--
Description: 
When we create an external table with PXF protocol and use an incorrect path, 
we get a deceiving error about Standby namenode.

eg:
{code}
template1=# create external table hive_txt1 (a int, b text) location 
('pxf://singlecluster/testtxt?profile=Hive') format 'custom' 
(formatter='pxfwritable_import');
CREATE EXTERNAL TABLE
template1=# select * from hive_txt1;
  
ERROR:  Standby NameNode of HA nameservice singlecluster was not found after 
call to Active NameNode failed - failover aborted (pxfmasterapi.c:257)
{code}

  was:
When we create an external table with PXF protocol and use an incorrect path, 
we get a deceiving error about Standby namenode.

eg:
```
template1=# create external table hive_txt1 (a int, b text) location 
('pxf://singlecluster/testtxt?profile=Hive') format 'custom' 
(formatter='pxfwritable_import');
CREATE EXTERNAL TABLE
template1=# select * from hive_txt1;
  
ERROR:  Standby NameNode of HA nameservice singlecluster was not found after 
call to Active NameNode failed - failover aborted (pxfmasterapi.c:257)
```


> Error reporting is misleading with incorrect table/file in PXF location in HA 
> cluster
> -
>
> Key: HAWQ-965
> URL: https://issues.apache.org/jira/browse/HAWQ-965
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Reporter: Shivram Mani
>Assignee: Goden Yao
>
> When we create an external table with PXF protocol and use an incorrect path, 
> we get a deceiving error about Standby namenode.
> eg:
> {code}
> template1=# create external table hive_txt1 (a int, b text) location 
> ('pxf://singlecluster/testtxt?profile=Hive') format 'custom' 
> (formatter='pxfwritable_import');
> CREATE EXTERNAL TABLE
> template1=# select * from hive_txt1;  
>   
>   ERROR:  Standby NameNode of HA nameservice singlecluster was not found 
> after call to Active NameNode failed - failover aborted (pxfmasterapi.c:257)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-965) Error reporting is misleading with incorrect table/file in PXF location in HA cluster

2016-07-28 Thread Shivram Mani (JIRA)
Shivram Mani created HAWQ-965:
-

 Summary: Error reporting is misleading with incorrect table/file 
in PXF location in HA cluster
 Key: HAWQ-965
 URL: https://issues.apache.org/jira/browse/HAWQ-965
 Project: Apache HAWQ
  Issue Type: Bug
  Components: PXF
Reporter: Shivram Mani
Assignee: Goden Yao


When we create an external table with PXF protocol and use an incorrect path, 
we get a deceiving error about Standby namenode.

eg:
```
template1=# create external table hive_txt1 (a int, b text) location 
('pxf://singlecluster/testtxt?profile=Hive') format 'custom' 
(formatter='pxfwritable_import');
CREATE EXTERNAL TABLE
template1=# select * from hive_txt1;
  
ERROR:  Standby NameNode of HA nameservice singlecluster was not found after 
call to Active NameNode failed - failover aborted (pxfmasterapi.c:257)
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-963) Enhance PXF to support additional operators

2016-07-28 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-963:
---
Fix Version/s: backlog

> Enhance PXF to support additional operators
> ---
>
> Key: HAWQ-963
> URL: https://issues.apache.org/jira/browse/HAWQ-963
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: PXF
>Reporter: Shivram Mani
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Supported operations in PXF only include
> <, >, <=, >=, =, !=. 
> Will need to add support for more operators in the PXF framework
> between(), in(), isNull(), etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-962) Make catalog:type_sanity be able to run with other cases in parallel

2016-07-28 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-962:
---
Fix Version/s: 2.0.1.0-incubating

> Make catalog:type_sanity be able to run with other cases in parallel
> 
>
> Key: HAWQ-962
> URL: https://issues.apache.org/jira/browse/HAWQ-962
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Paul Guo
>Assignee: Lei Chang
> Fix For: 2.0.1.0-incubating
>
>
> The test case will query some database-level system tables while with 
> parallel  google testing is being enabled (see  HAWQ-955. Add scriptS for 
> feature test running in parallel.), the test could fail. We need to create a 
> new database in the test case to avoid this..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-964) Support for additional logical operators in PXF

2016-07-28 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-964:
---
Fix Version/s: backlog

> Support for additional logical operators in PXF
> ---
>
> Key: HAWQ-964
> URL: https://issues.apache.org/jira/browse/HAWQ-964
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: PXF
>Reporter: Shivram Mani
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Currently the extension framework only allows the 'AND' logical operator 
> across the provided predicates. Will need to support other logical operators 
> such as 'OR' 'NOT'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-931) HiveORCAccessor with support for Predicate pushdown and Column Projection

2016-07-28 Thread Shivram Mani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivram Mani updated HAWQ-931:
--
Description: 
HiveORCAccessor will be a subclass of the existing HiveAccessor with support 
for Predicate Pushdown and Column projection.
We will be using the job configuration object which is used by the ORC reader. 
This will still be a record based reader and not batch/vector based

We will  map the filter information passed from HAWQ via PXF into 
SearchArgument object and set in sarg.pushdown configuration property.
We will populate the column info passed from HAWQ (HAWQ-927) into the following 
configuration properties
hive.io.file.readcolumn.ids, hive.io.file.readcolumn.names. 
hive.io.file.read.all.columns will be set to false.




  was:
HiveORCAccessor will be a subclass of the existing HiveAccessor with support 
for Predicate Pushdown and Column projection.
We will be using the job configuration object which is used by the ORC reader.

We will  map the filter information passed from HAWQ via PXF into 
SearchArgument object and set in sarg.pushdown configuration property.
We will populate the column info passed from HAWQ (HAWQ-927) into the following 
configuration properties
hive.io.file.readcolumn.ids, hive.io.file.readcolumn.names. 
hive.io.file.read.all.columns will be set to false.



> HiveORCAccessor with support for Predicate pushdown and Column Projection
> -
>
> Key: HAWQ-931
> URL: https://issues.apache.org/jira/browse/HAWQ-931
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: PXF
>Reporter: Shivram Mani
>Assignee: Shivram Mani
> Fix For: backlog
>
>
> HiveORCAccessor will be a subclass of the existing HiveAccessor with support 
> for Predicate Pushdown and Column projection.
> We will be using the job configuration object which is used by the ORC 
> reader. This will still be a record based reader and not batch/vector based
> We will  map the filter information passed from HAWQ via PXF into 
> SearchArgument object and set in sarg.pushdown configuration property.
> We will populate the column info passed from HAWQ (HAWQ-927) into the 
> following configuration properties
> hive.io.file.readcolumn.ids, hive.io.file.readcolumn.names. 
> hive.io.file.read.all.columns will be set to false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-964) Support for additional logical operators in PXF

2016-07-28 Thread Shivram Mani (JIRA)
Shivram Mani created HAWQ-964:
-

 Summary: Support for additional logical operators in PXF
 Key: HAWQ-964
 URL: https://issues.apache.org/jira/browse/HAWQ-964
 Project: Apache HAWQ
  Issue Type: Improvement
  Components: PXF
Reporter: Shivram Mani
Assignee: Goden Yao


Currently the extension framework only allows the 'AND' logical operator across 
the provided predicates. Will need to support other logical operators such as 
'OR' 'NOT'




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-931) HiveORCAccessor with support for Predicate pushdown and Column Projection

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397823#comment-15397823
 ] 

Shivram Mani commented on HAWQ-931:
---

Current framework only supports AND as a logical operator across predicates.
To be fixed in (HAWQ-964)

> HiveORCAccessor with support for Predicate pushdown and Column Projection
> -
>
> Key: HAWQ-931
> URL: https://issues.apache.org/jira/browse/HAWQ-931
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: PXF
>Reporter: Shivram Mani
>Assignee: Shivram Mani
> Fix For: backlog
>
>
> HiveORCAccessor will be a subclass of the existing HiveAccessor with support 
> for Predicate Pushdown and Column projection.
> We will be using the job configuration object which is used by the ORC reader.
> We will  map the filter information passed from HAWQ via PXF into 
> SearchArgument object and set in sarg.pushdown configuration property.
> We will populate the column info passed from HAWQ (HAWQ-927) into the 
> following configuration properties
> hive.io.file.readcolumn.ids, hive.io.file.readcolumn.names. 
> hive.io.file.read.all.columns will be set to false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-931) HiveORCAccessor with support for Predicate pushdown and Column Projection

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397817#comment-15397817
 ] 

Shivram Mani commented on HAWQ-931:
---

Supported operations in PXF only include
<, >, <=, >=, =, !=. 
Will need to add support for more operators in the PXF framework (HAWQ-963)
between(), in(), isNull()

> HiveORCAccessor with support for Predicate pushdown and Column Projection
> -
>
> Key: HAWQ-931
> URL: https://issues.apache.org/jira/browse/HAWQ-931
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: PXF
>Reporter: Shivram Mani
>Assignee: Shivram Mani
> Fix For: backlog
>
>
> HiveORCAccessor will be a subclass of the existing HiveAccessor with support 
> for Predicate Pushdown and Column projection.
> We will be using the job configuration object which is used by the ORC reader.
> We will  map the filter information passed from HAWQ via PXF into 
> SearchArgument object and set in sarg.pushdown configuration property.
> We will populate the column info passed from HAWQ (HAWQ-927) into the 
> following configuration properties
> hive.io.file.readcolumn.ids, hive.io.file.readcolumn.names. 
> hive.io.file.read.all.columns will be set to false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-963) Enhance PXF to support additional operators

2016-07-28 Thread Shivram Mani (JIRA)
Shivram Mani created HAWQ-963:
-

 Summary: Enhance PXF to support additional operators
 Key: HAWQ-963
 URL: https://issues.apache.org/jira/browse/HAWQ-963
 Project: Apache HAWQ
  Issue Type: Improvement
  Components: PXF
Reporter: Shivram Mani
Assignee: Goden Yao


Supported operations in PXF only include
<, >, <=, >=, =, !=. 
Will need to add support for more operators in the PXF framework
between(), in(), isNull(), etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-931) HiveORCAccessor with support for Predicate pushdown and Column Projection

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397813#comment-15397813
 ] 

Shivram Mani commented on HAWQ-931:
---

The current approach is to use Hive 1.2.1's 
org.apache.hadoop.hive.ql.io.sarg.SearchArgument and 
org.apache.hadoop.hive.ql.io.sarg.SearchArgumentFactory to build the Predicate 
from the HAWQ predicate. 
These apis have changed with Hive 2.0 and will need to be revisited.
Specifically with functions on SearchArgument.Builder also need the data type.
eg: builder.lessThanEquals(filterColumnName, filterValue) will chagne to 
builder.lessThanEquals(filterColumnName, type, filterValue);

> HiveORCAccessor with support for Predicate pushdown and Column Projection
> -
>
> Key: HAWQ-931
> URL: https://issues.apache.org/jira/browse/HAWQ-931
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: PXF
>Reporter: Shivram Mani
>Assignee: Shivram Mani
> Fix For: backlog
>
>
> HiveORCAccessor will be a subclass of the existing HiveAccessor with support 
> for Predicate Pushdown and Column projection.
> We will be using the job configuration object which is used by the ORC reader.
> We will  map the filter information passed from HAWQ via PXF into 
> SearchArgument object and set in sarg.pushdown configuration property.
> We will populate the column info passed from HAWQ (HAWQ-927) into the 
> following configuration properties
> hive.io.file.readcolumn.ids, hive.io.file.readcolumn.names. 
> hive.io.file.read.all.columns will be set to false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-779) support more pxf filter pushdwon

2016-07-28 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397794#comment-15397794
 ] 

Shivram Mani commented on HAWQ-779:
---

I reverted the 2 commits from HAWQ-779 as they are not error proof
https://github.com/apache/incubator-hawq/commit/e150dc4e346bf471687e32c32f37c66896d302ec
https://github.com/apache/incubator-hawq/commit/1a17238d34cf66cdc52ea257bcdfdd0bdb497054

I would just patch the above issues(HAWQ-953,HAWQ-950) into HAWQ-779 branch 
itself and merge squash as one commit

>  support more pxf filter pushdwon
> -
>
> Key: HAWQ-779
> URL: https://issues.apache.org/jira/browse/HAWQ-779
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: PXF
>Reporter: Devin Jia
>Assignee: Shivram Mani
> Fix For: 2.0.1.0-incubating
>
>
> When I use the pxf hawq, I need to read a traditional relational database 
> systems and solr by way of the external table. The project 
> :https://github.com/Pivotal-Field-Engineering/pxf-field/tree/master/jdbc-pxf-ext,
>  only "WriteAccessor ",so I developed 2 plug-ins, the projects: 
> https://github.com/inspur-insight/pxf-plugin , But these two plug-ins need to 
> modified HAWQ:
> 1. When get a list of fragment from pxf services, push down the 
> 'filterString'. modify the backend / optimizer / plan / createplan.c of 
> create_pxf_plan methods:
> segdb_work_map = map_hddata_2gp_segments (uri_str,
> total_segs, segs_participating,
> relation, ctx-> root-> parse-> jointree-> quals);
> 2. modify pxffilters.h and pxffilters.c, support TEXT types LIKE operation, 
> Date type data operator, Float type operator.
> 3. Modify org.apache.hawq.pxf.api.FilterParser.java, support the LIKE 
> operator.
> I already created a feature branch in my local ,and tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq issue #808: HAWQ-944. Implement new pg_ltoa function as per p...

2016-07-28 Thread shivzone
Github user shivzone commented on the issue:

https://github.com/apache/incubator-hawq/pull/808
  
I don't see INT32_CHAR_SIZE defined anywhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #817: HAWQ-954. Check that ExternalSelectDesc reference...

2016-07-28 Thread shivzone
Github user shivzone commented on the issue:

https://github.com/apache/incubator-hawq/pull/817
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #825: HAWQ-961. Dispatch session user id (not cu...

2016-07-28 Thread vikash686
GitHub user vikash686 opened a pull request:

https://github.com/apache/incubator-hawq/pull/825

HAWQ-961. Dispatch session user id (not current BOOTSTRAP_SUPERUSERID…

…) on master to segments

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-guo-/incubator-hawq crypto

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #825


commit 3121e3f9e3ab80bcaae53778bdfcffb28dd3c9bf
Author: Paul Guo 
Date:   2016-07-28T03:38:16Z

HAWQ-961. Dispatch session user id (not current BOOTSTRAP_SUPERUSERID) on 
master to segments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #808: HAWQ-944. Implement new pg_ltoa function as per p...

2016-07-28 Thread paul-guo-
Github user paul-guo- commented on the issue:

https://github.com/apache/incubator-hawq/pull/808
  
I suspect pg_ltoa() will be faster if it fills the characters one by one in 
descending order, instead of current solution: filling in ascending order and 
then swapping.

Given the code comes from pg, I'd give a +1 at first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #818: HAWQ-955. Add scriptS for feature test running in...

2016-07-28 Thread paul-guo-
Github user paul-guo- commented on the issue:

https://github.com/apache/incubator-hawq/pull/818
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HAWQ-947) set work_mem cannot work

2016-07-28 Thread Biao Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397285#comment-15397285
 ] 

Biao Wu commented on HAWQ-947:
--

ok,thanks

> set work_mem cannot work
> 
>
> Key: HAWQ-947
> URL: https://issues.apache.org/jira/browse/HAWQ-947
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.0.1.0-incubating
>Reporter: Biao Wu
>Assignee: Lei Chang
> Fix For: 2.0.1.0-incubating
>
>
> HAWQ version is 2.0.1.0 build dev.
> EXPLAIN ANALYZE:
> Work_mem: 9554K bytes max, 63834K bytes wanted。
> then set work_mem to '512MB',but not work
> {code:sql}
> test=# EXPLAIN ANALYZE SELECT count(DISTINCT item_sku_id)
> test-# FROM gdm_m03_item_sku_da
> test-# WHERE item_origin ='中国大陆';
>   
>   
>QUERY PLAN
> 
> 
>  Aggregate  (cost=54177150.69..54177150.70 rows=1 width=8)
>Rows out:  Avg 1.0 rows x 1 workers.  
> Max/Last(seg-1:BJHC-HEBE-9014.hadoop.jd.local/seg-1:BJHC-HEBE-9014.hadoop.jd.local)
>  1/1 rows with 532498/532498 ms to end, start offset by 201/201 ms.
>->  Gather Motion 306:1  (slice2; segments: 306)  
> (cost=54177147.60..54177150.68 rows=1 width=8)
>  Rows out:  Avg 306.0 rows x 1 workers at destination.  
> Max/Last(seg-1:BJHC-HEBE-9014.hadoop.jd.local/seg-1:BJHC-HEBE-9014.hadoop.jd.local)
>  306/306 rows with 529394/529394 ms to first row, 532498/532498 ms to end, 
> start offset b
> y 201/201 ms.
>  ->  Aggregate  (cost=54177147.60..54177147.61 rows=1 width=8)
>Rows out:  Avg 1.0 rows x 306 workers.  
> Max/Last(seg305:BJHC-HEBE-9031.hadoop.jd.local/seg258:BJHC-HEBE-9029.hadoop.jd.local)
>  1/1 rows with 530367/532274 ms to end, start offset by 396/246 ms.
>Executor memory:  9554K bytes avg, 9554K bytes max 
> (seg305:BJHC-HEBE-9031.hadoop.jd.local).
>Work_mem used:  9554K bytes avg, 9554K bytes max 
> (seg305:BJHC-HEBE-9031.hadoop.jd.local).
>Work_mem wanted: 63695K bytes avg, 63834K bytes max 
> (seg296:BJHC-HEBE-9031.hadoop.jd.local) to lessen workfile I/O affecting 306 
> workers.
>->  Redistribute Motion 306:306  (slice1; segments: 306)  
> (cost=0.00..53550018.97 rows=819776 width=11)
>  Hash Key: gdm_m03_item_sku_da.item_sku_id
>  Rows out:  Avg 820083.0 rows x 306 workers at 
> destination.  
> Max/Last(seg296:BJHC-HEBE-9031.hadoop.jd.local/seg20:BJHC-HEBE-9016.hadoop.jd.local)
>  821880/818660 rows with 769/771 ms to first row, 524681/525063 ms to e
> nd, start offset by 352/307 ms.
>  ->  Append-only Scan on gdm_m03_item_sku_da  
> (cost=0.00..48532990.00 rows=819776 width=11)
>Filter: item_origin::text = '中国大陆'::text
>Rows out:  Avg 820083.0 rows x 306 workers.  
> Max/Last(seg46:BJHC-HEBE-9017.hadoop.jd.local/seg5:BJHC-HEBE-9015.hadoop.jd.local)
>  893390/810582 rows with 28/127 ms to first row, 73062/526318 ms to end, 
> start off
> set by 354/458 ms.
>  Slice statistics:
>(slice0)Executor memory: 1670K bytes.
>(slice1)Executor memory: 3578K bytes avg x 306 workers, 4711K bytes 
> max (seg172:BJHC-HEBE-9024.hadoop.jd.local).
>(slice2)  * Executor memory: 10056K bytes avg x 306 workers, 10056K bytes 
> max (seg305:BJHC-HEBE-9031.hadoop.jd.local).  Work_mem: 9554K bytes max, 
> 63834K bytes wanted.
>  Statement statistics:
>Memory used: 262144K bytes
>Memory wanted: 64233K bytes
>  Settings:  default_hash_table_bucket_number=6
>  Dispatcher statistics:
>executors used(total/cached/new connection): (612/0/612); dispatcher 
> time(total/connection/dispatch data): (489.036 ms/192.741 ms/293.357 ms).
>dispatch data time(max/min/avg): (37.798 ms/0.011 ms/3.504 ms); consume 
> executor data time(max/min/avg): (0.016 ms/0.002 ms/0.005 ms); free executor 
> time(max/min/avg): (0.000 ms/0.000 ms/0.000 ms).
>  Data locality statistics:
>data locality ratio: 0.864; virtual segment number: 306; different host 
> number: 17; virtual segment number per host(avg/min/max): (18/18/18); segment 
> size(avg/min/max): (3435087582.693 B/3391891296 B/3

[GitHub] incubator-hawq pull request #818: HAWQ-955. Add scriptS for feature test run...

2016-07-28 Thread xunzhang
Github user xunzhang commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/818#discussion_r72580564
  
--- Diff: src/test/feature/parallel-run-feature-test.sh ---
@@ -0,0 +1,49 @@
+#! /bin/bash
+
+if [ x$GPHOME == 'x' ]; then
+  echo "Please source greenplum_path.sh before running feature tests."
+  exit 0
+fi
+
+PSQL=${GPHOME}/bin/psql
+HAWQ_DB=${PGDATABASE:-"postgres"}
+HAWQ_HOST=${PGHOST:-"localhost"}
+HAWQ_PORT=${PGPORT:-"5432"}
--- End diff --

Done, please review again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (HAWQ-922) Add basic verification for various pl and udf in HAWQ

2016-07-28 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo closed HAWQ-922.


> Add basic verification for various pl and udf in HAWQ
> -
>
> Key: HAWQ-922
> URL: https://issues.apache.org/jira/browse/HAWQ-922
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Upgrade
>Affects Versions: 2.0.0.0-incubating
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: 2.0.1.0-incubating
>
>
> For HAWQ upgrade from 2.0.0.0 to 2.0.1.0 related tasks, we need to add basic 
> data verification for hawq upgrade, including procedural languages, 
> user-defined functions, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HAWQ-922) Add basic verification for various pl and udf in HAWQ

2016-07-28 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo resolved HAWQ-922.
--
Resolution: Fixed

> Add basic verification for various pl and udf in HAWQ
> -
>
> Key: HAWQ-922
> URL: https://issues.apache.org/jira/browse/HAWQ-922
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Upgrade
>Affects Versions: 2.0.0.0-incubating
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: 2.0.1.0-incubating
>
>
> For HAWQ upgrade from 2.0.0.0 to 2.0.1.0 related tasks, we need to add basic 
> data verification for hawq upgrade, including procedural languages, 
> user-defined functions, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-hawq pull request #814: HAWQ-922. Add basic verification for vario...

2016-07-28 Thread huor
Github user huor closed the pull request at:

https://github.com/apache/incubator-hawq/pull/814


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HAWQ-922) Add basic verification for various pl and udf in HAWQ

2016-07-28 Thread Ruilong Huo (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397214#comment-15397214
 ] 

Ruilong Huo commented on HAWQ-922:
--

The basic verification for procedural languages and user-defined functions has 
been added.

> Add basic verification for various pl and udf in HAWQ
> -
>
> Key: HAWQ-922
> URL: https://issues.apache.org/jira/browse/HAWQ-922
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Upgrade
>Affects Versions: 2.0.0.0-incubating
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: 2.0.1.0-incubating
>
>
> For HAWQ upgrade from 2.0.0.0 to 2.0.1.0 related tasks, we need to add basic 
> data verification for hawq upgrade, including procedural languages, 
> user-defined functions, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-947) set work_mem cannot work

2016-07-28 Thread Lei Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397169#comment-15397169
 ] 

Lei Chang commented on HAWQ-947:


we should not configure the work_mem anymore, instead, we can use resource 
queues to configure the memory used.

> set work_mem cannot work
> 
>
> Key: HAWQ-947
> URL: https://issues.apache.org/jira/browse/HAWQ-947
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.0.1.0-incubating
>Reporter: Biao Wu
>Assignee: Lei Chang
> Fix For: 2.0.1.0-incubating
>
>
> HAWQ version is 2.0.1.0 build dev.
> EXPLAIN ANALYZE:
> Work_mem: 9554K bytes max, 63834K bytes wanted。
> then set work_mem to '512MB',but not work
> {code:sql}
> test=# EXPLAIN ANALYZE SELECT count(DISTINCT item_sku_id)
> test-# FROM gdm_m03_item_sku_da
> test-# WHERE item_origin ='中国大陆';
>   
>   
>QUERY PLAN
> 
> 
>  Aggregate  (cost=54177150.69..54177150.70 rows=1 width=8)
>Rows out:  Avg 1.0 rows x 1 workers.  
> Max/Last(seg-1:BJHC-HEBE-9014.hadoop.jd.local/seg-1:BJHC-HEBE-9014.hadoop.jd.local)
>  1/1 rows with 532498/532498 ms to end, start offset by 201/201 ms.
>->  Gather Motion 306:1  (slice2; segments: 306)  
> (cost=54177147.60..54177150.68 rows=1 width=8)
>  Rows out:  Avg 306.0 rows x 1 workers at destination.  
> Max/Last(seg-1:BJHC-HEBE-9014.hadoop.jd.local/seg-1:BJHC-HEBE-9014.hadoop.jd.local)
>  306/306 rows with 529394/529394 ms to first row, 532498/532498 ms to end, 
> start offset b
> y 201/201 ms.
>  ->  Aggregate  (cost=54177147.60..54177147.61 rows=1 width=8)
>Rows out:  Avg 1.0 rows x 306 workers.  
> Max/Last(seg305:BJHC-HEBE-9031.hadoop.jd.local/seg258:BJHC-HEBE-9029.hadoop.jd.local)
>  1/1 rows with 530367/532274 ms to end, start offset by 396/246 ms.
>Executor memory:  9554K bytes avg, 9554K bytes max 
> (seg305:BJHC-HEBE-9031.hadoop.jd.local).
>Work_mem used:  9554K bytes avg, 9554K bytes max 
> (seg305:BJHC-HEBE-9031.hadoop.jd.local).
>Work_mem wanted: 63695K bytes avg, 63834K bytes max 
> (seg296:BJHC-HEBE-9031.hadoop.jd.local) to lessen workfile I/O affecting 306 
> workers.
>->  Redistribute Motion 306:306  (slice1; segments: 306)  
> (cost=0.00..53550018.97 rows=819776 width=11)
>  Hash Key: gdm_m03_item_sku_da.item_sku_id
>  Rows out:  Avg 820083.0 rows x 306 workers at 
> destination.  
> Max/Last(seg296:BJHC-HEBE-9031.hadoop.jd.local/seg20:BJHC-HEBE-9016.hadoop.jd.local)
>  821880/818660 rows with 769/771 ms to first row, 524681/525063 ms to e
> nd, start offset by 352/307 ms.
>  ->  Append-only Scan on gdm_m03_item_sku_da  
> (cost=0.00..48532990.00 rows=819776 width=11)
>Filter: item_origin::text = '中国大陆'::text
>Rows out:  Avg 820083.0 rows x 306 workers.  
> Max/Last(seg46:BJHC-HEBE-9017.hadoop.jd.local/seg5:BJHC-HEBE-9015.hadoop.jd.local)
>  893390/810582 rows with 28/127 ms to first row, 73062/526318 ms to end, 
> start off
> set by 354/458 ms.
>  Slice statistics:
>(slice0)Executor memory: 1670K bytes.
>(slice1)Executor memory: 3578K bytes avg x 306 workers, 4711K bytes 
> max (seg172:BJHC-HEBE-9024.hadoop.jd.local).
>(slice2)  * Executor memory: 10056K bytes avg x 306 workers, 10056K bytes 
> max (seg305:BJHC-HEBE-9031.hadoop.jd.local).  Work_mem: 9554K bytes max, 
> 63834K bytes wanted.
>  Statement statistics:
>Memory used: 262144K bytes
>Memory wanted: 64233K bytes
>  Settings:  default_hash_table_bucket_number=6
>  Dispatcher statistics:
>executors used(total/cached/new connection): (612/0/612); dispatcher 
> time(total/connection/dispatch data): (489.036 ms/192.741 ms/293.357 ms).
>dispatch data time(max/min/avg): (37.798 ms/0.011 ms/3.504 ms); consume 
> executor data time(max/min/avg): (0.016 ms/0.002 ms/0.005 ms); free executor 
> time(max/min/avg): (0.000 ms/0.000 ms/0.000 ms).
>  Data locality statistics:
>data locality ratio: 0.864; virtual segment number: 306; different host 
> number: 17; virtual segme