[jira] [Updated] (DRILL-4039) Query fails when non-ascii characters are used in string literals
[ https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyun Liu updated DRILL-4039: - Attachment: DRILL-4039.patch.txt > Query fails when non-ascii characters are used in string literals > - > > Key: DRILL-4039 > URL: https://issues.apache.org/jira/browse/DRILL-4039 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.1.0 > Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May > 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Sergio Lob > Attachments: DRILL-4039.patch.txt > > > The following query against DRILL returns this error: > SYSTEM ERROR: CalciteException: Failed to encode 'НАСТРОЕние' in character > set 'ISO-8859-1' > cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd > Query is: > SELECT > T1.`F01INT`, > T1.`F02UCHAR_10`, > T1.`F03UVARCHAR_10` > FROM > DPRV64R6_TRDUNI01T T1 > WHERE > (T1.`F03UVARCHAR_10` = 'НАСТРОЕние') > ORDER BY > T1.`F01INT`; > This issue looks similar to jira HIVE-12207. > Is there a fix or workaround for this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4039) Query fails when non-ascii characters are used in string literals
[ https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151921#comment-15151921 ] liyun Liu commented on DRILL-4039: -- Jingguo, the patch I submitted will also fix the problem you mentioned about 'show tables' and 'describe'. > Query fails when non-ascii characters are used in string literals > - > > Key: DRILL-4039 > URL: https://issues.apache.org/jira/browse/DRILL-4039 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.1.0 > Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May > 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Sergio Lob > Attachments: DRILL-4039.patch.txt > > > The following query against DRILL returns this error: > SYSTEM ERROR: CalciteException: Failed to encode 'НАСТРОЕние' in character > set 'ISO-8859-1' > cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd > Query is: > SELECT > T1.`F01INT`, > T1.`F02UCHAR_10`, > T1.`F03UVARCHAR_10` > FROM > DPRV64R6_TRDUNI01T T1 > WHERE > (T1.`F03UVARCHAR_10` = 'НАСТРОЕние') > ORDER BY > T1.`F01INT`; > This issue looks similar to jira HIVE-12207. > Is there a fix or workaround for this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4257) Ensure shutting down a Drillbit also shuts down all StoragePlugins
[ https://issues.apache.org/jira/browse/DRILL-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151944#comment-15151944 ] Khurram Faraaz commented on DRILL-4257: --- Do we have a unit test to verify this ? Is there a way to verify this from the drillbit.log, that all StoragePlugins are also shut down once a Drillbit is shut down ? > Ensure shutting down a Drillbit also shuts down all StoragePlugins > -- > > Key: DRILL-4257 > URL: https://issues.apache.org/jira/browse/DRILL-4257 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.5.0 > > > Right now, if a StoragePlugin implementation relies on the close method to > clean up resources, those resources won't be cleaned up when the Drillbit > class is shutdown. This is because Drillbit doesn't actually close the > StoragePluginRegistry and associated resources. This causes problems in > leaking resources in tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4241) Add Experimental Kudu plugin
[ https://issues.apache.org/jira/browse/DRILL-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152065#comment-15152065 ] ASF GitHub Bot commented on DRILL-4241: --- Github user tdunning commented on the pull request: https://github.com/apache/drill/pull/314#issuecomment-185636190 How could this have been merged? There is a huge double standard going on here. This code has NO comments. No tests. No documentation. No design. It isn't nearly good enough to pass the reviews that are required for others to contribute code. How can it be merged without any kind of significant review? > Add Experimental Kudu plugin > > > Key: DRILL-4241 > URL: https://issues.apache.org/jira/browse/DRILL-4241 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.5.0 > > > Merge the work done here into Drill master so others can utilize the plugin: > https://github.com/dremio/drill-storage-kudu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table
[ https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-3688: Description: Currently Drill does not honor the "skip.header.line.count" attribute of Hive table. It may cause some other format conversion issue. Reproduce: 1. Create a Hive table {code} create table h1db.testheader(col0 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE tblproperties("skip.header.line.count"="1"); {code} 2. Prepare a sample data: {code} # cat test.data col0 2015-01-01 {code} 3. Load sample data into Hive {code} LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader; {code} 4. Hive {code} hive> select * from h1db.testheader ; OK 2015-01-01 Time taken: 0.254 seconds, Fetched: 1 row(s) {code} 5. Drill {code} > select * from hive.h1db.testheader ; +-+ |col0 | +-+ | col0| | 2015-01-01 | +-+ 2 rows selected (0.257 seconds) > select cast(col0 as date) from hive.h1db.testheader ; Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12] Fragment 0:0 [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010] (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be in the range [1,12] org.joda.time.field.FieldUtils.verifyValueBounds():236 org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613 org.joda.time.chrono.BasicChronology.getDateTimeMillis():159 org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218 org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67 org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 org.apache.drill.exec.record.AbstractRecordBatch.next():147 org.apache.drill.exec.physical.impl.BaseRootExec.next():83 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79 org.apache.drill.exec.physical.impl.BaseRootExec.next():73 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1566 org.apache.drill.exec.work.fragment.FragmentExecutor.run():255 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 (state=,code=0) {code} Also "skip.footer.line.count" should be taken into account. If "skip.header.line.count" or "skip.footer.line.count" has incorrect value in Hive, throw appropriate exception in Drill. Ex: Hive table property skip.header.line.count value 'someValue' is non-numeric was: Currently Drill does not honor the "skip.header.line.count" attribute of Hive table. It may cause some other format conversion issue. Reproduce: 1. Create a Hive table {code} create table h1db.testheader(col0 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE tblproperties("skip.header.line.count"="1"); {code} 2. Prepare a sample data: {code} # cat test.data col0 2015-01-01 {code} 3. Load sample data into Hive {code} LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader; {code} 4. Hive {code} hive> select * from h1db.testheader ; OK 2015-01-01 Time taken: 0.254 seconds, Fetched: 1 row(s) {code} 5. Drill {code} > select * from hive.h1db.testheader ; +-+ |col0 | +-+ | col0| | 2015-01-01 | +-+ 2 rows selected (0.257 seconds) > select cast(col0 as date) from hive.h1db.testheader ; Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must be in the range [1,12] Fragment 0:0 [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010] (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be in the range [1,12] org.joda.time.field.FieldUtils.verifyValueBounds():236 org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613 org.joda.time.chrono.BasicChronology.getDateTimeMillis():159 org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261 org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218 org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67 org.
[jira] [Commented] (DRILL-4241) Add Experimental Kudu plugin
[ https://issues.apache.org/jira/browse/DRILL-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152367#comment-15152367 ] ASF GitHub Bot commented on DRILL-4241: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/314#issuecomment-185738887 The Kudu plugin was contributed by six different developers, three of which are Drill PMC members and two more which are PMC members of other Apache projects. The code remained available for five days for review and received two plus ones and no negative feedback. It is modeled after the HBase plugin and works the same. It is unfortunate that there aren't integrated tests (due to the fact that there wasn't an easy way to provide integrated tests such as mini hbase cluster) but it was and is regularly manually tested. Due to the light testing, we are communicating it as experimental to users. Suggesting it didn't go through review when you have that large a group of developers involved is weird. Assuming that the user api isn't in dispute (which no one here disputed most likely because Kudu looks exactly like an Oracle table), providing experimental plugins increases the breadth of Drill's appeal and thus broadens and strengthens the community. > Add Experimental Kudu plugin > > > Key: DRILL-4241 > URL: https://issues.apache.org/jira/browse/DRILL-4241 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.5.0 > > > Merge the work done here into Drill master so others can utilize the plugin: > https://github.com/dremio/drill-storage-kudu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152497#comment-15152497 ] ASF GitHub Bot commented on DRILL-4387: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53335185 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/GroupScan.java --- @@ -35,6 +35,8 @@ public interface GroupScan extends Scan, HasAffinity{ public static final List ALL_COLUMNS = ImmutableList.of(SchemaPath.getSimplePath("*")); + public static final List EMPTY_COLUMNS = ImmutableList.of(); --- End diff -- This static constant does not seem to be referenced anywhere ? > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4152) Add additional logging and metrics to the Parquet reader
[ https://issues.apache.org/jira/browse/DRILL-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dechang Gu closed DRILL-4152. - Verified with log turned on. it works. > Add additional logging and metrics to the Parquet reader > > > Key: DRILL-4152 > URL: https://issues.apache.org/jira/browse/DRILL-4152 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Reporter: Parth Chandra >Assignee: Parth Chandra > Fix For: 1.5.0 > > > In some cases, we see the Parquet reader as the bottleneck in reading from > the file system. RWSpeedTest is able to read 10x faster than the Parquet > reader so reading from disk is not the issue. This issue is to add more > instrumentation to the Parquet reader so speed bottlenecks can be better > diagnosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152573#comment-15152573 ] ASF GitHub Bot commented on DRILL-4387: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53340898 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java --- @@ -87,9 +87,6 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS newColumns.add(column); } } - if (newColumns.isEmpty()) { --- End diff -- So, to clarify, the reason you removed the check for newColumns.isEmpty() is that if the column list is empty, the underlying ParquetRecordReader will handle it correctly by produce 1 default column (probably a NullableInt column) ? Was this check for isEmpty() only present in the Parquet scan ? or should other readers need modification too ? I think it would be good to add comments about how the NULL and empty column list are being handled by each data source. > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dechang Gu closed DRILL-4256. - Verified by Rahul. > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli >Assignee: Venki Korukanti > Fix For: 1.5.0 > > Attachments: jstack.tgz > > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4380) Fix performance regression: in creation of FileSelection in ParquetFormatPlugin to not set files if metadata cache is available.
[ https://issues.apache.org/jira/browse/DRILL-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dechang Gu closed DRILL-4380. - Verified. LGTM. > Fix performance regression: in creation of FileSelection in > ParquetFormatPlugin to not set files if metadata cache is available. > > > Key: DRILL-4380 > URL: https://issues.apache.org/jira/browse/DRILL-4380 > Project: Apache Drill > Issue Type: Bug >Reporter: Parth Chandra > Fix For: 1.5.0 > > > The regression has been caused by the changes in > 367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over > empty folders consistently so that they report table not found rather than > failing.) > In ParquetFormatPlugin, the original code created a FileSelection object in > the following code: > {code} > return new FileSelection(fileNames, metaRootPath.toString(), metadata, > selection.getFileStatusList(fs)); > {code} > The selection.getFileStatusList call made an inexpensive call to > FileSelection.init(). The call was inexpensive because the > FileSelection.files member was not set and the code does not need to make an > expensive call to get the file statuses corresponding to the files in the > FileSelection.files member. > In the new code, this is replaced by > {code} > final FileSelection newSelection = FileSelection.create(null, fileNames, > metaRootPath.toString()); > return ParquetFileSelection.create(newSelection, metadata); > {code} > This sets the FileSelection.files member but not the FileSelection.statuses > member. A subsequent call to FileSelection.getStatuses ( in > ParquetGroupScan() ) now makes an expensive call to get all the statuses. > It appears that there was an implicit assumption that the > FileSelection.statuses member should be set before the FileSelection.files > member is set. This assumption is no longer true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4413) Improve FrameSupportTemplate to do the setup only when necessary
Deneche A. Hakim created DRILL-4413: --- Summary: Improve FrameSupportTemplate to do the setup only when necessary Key: DRILL-4413 URL: https://issues.apache.org/jira/browse/DRILL-4413 Project: Apache Drill Issue Type: Sub-task Components: Execution - Relational Operators Affects Versions: 1.4.0 Reporter: Deneche A. Hakim Current implementation of FrameSupportTemplate does some setup at the beginning of every partition. We shouldn't need to redo the setup until the batch changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4413) Improve FrameSupportTemplate to do the setup only when necessary
[ https://issues.apache.org/jira/browse/DRILL-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152640#comment-15152640 ] ASF GitHub Bot commented on DRILL-4413: --- Github user adeneche commented on a diff in the pull request: https://github.com/apache/drill/pull/340#discussion_r53347747 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/FrameSupportTemplate.java --- @@ -134,44 +142,67 @@ private void cleanPartition() { * @throws DrillException if it can't write into the container */ private int processPartition(final int currentRow) throws DrillException { -logger.trace("process partition {}, currentRow: {}, outputCount: {}", partition, currentRow, outputCount); +logger.trace("{} rows remaining to process, currentRow: {}, outputCount: {}", remainingRows, currentRow, outputCount); setupWriteFirstValue(internal, container); -int row = currentRow; +if (popConfig.isRows()) { + return processROWS(currentRow); +} else { + return processRANGE(currentRow); +} + } + + private int processROWS(int row) throws DrillException { +//TODO we only need to call these once per batch --- End diff -- We do the setup at the beginning of every partition. In case we have multiple partitions in the same batch setup should only be done once. To make the matters more complicated, if we are aggregating a single partition that spans multiple batches, we also need to do the setup for every batch. The TODO is still valid. I create [DRILL-4413](https://issues.apache.org/jira/browse/DRILL-4413) to keep track of it > Improve FrameSupportTemplate to do the setup only when necessary > > > Key: DRILL-4413 > URL: https://issues.apache.org/jira/browse/DRILL-4413 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Relational Operators >Affects Versions: 1.4.0 >Reporter: Deneche A. Hakim > Fix For: Future > > > Current implementation of FrameSupportTemplate does some setup at the > beginning of every partition. We shouldn't need to redo the setup until the > batch changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152676#comment-15152676 ] ASF GitHub Bot commented on DRILL-4410: --- Github user jaltekruse commented on the pull request: https://github.com/apache/drill/pull/380#issuecomment-185825063 Could you also add result verification? While there are some older tests that just run queries to verify that errors that previously occurred are gone, we have been enforcing that new tests since the test builder was added verify their results. You can use the test builder to add records in a loop to the expected result set. > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4256) Performance regression in hive planning
[ https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152682#comment-15152682 ] Rahul Challapalli commented on DRILL-4256: -- [~dgu-atmapr] This is not closed as we did not automate the fix using the performance framework > Performance regression in hive planning > --- > > Key: DRILL-4256 > URL: https://issues.apache.org/jira/browse/DRILL-4256 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive, Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Rahul Challapalli >Assignee: Venki Korukanti > Fix For: 1.5.0 > > Attachments: jstack.tgz > > > Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > The fix for reading hive tables backed by hbase caused a performance > regression. The data set used in the below test has ~3700 partitions and the > filter in the query would ensure only 1 partition get selected. > {code} > Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~25 seconds > {code} > {code} > Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8 > Query : explain plan for select count(*) from lineitem_partitioned where > `year`=2015 and `month`=1 and `day` =1; > Time : ~6.5 seconds > {code} > Since the data is large, I couldn't attach it here. Reach out to me if you > need additional information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4411) HashJoin should not only depend on number of records, but also on size
[ https://issues.apache.org/jira/browse/DRILL-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152698#comment-15152698 ] ASF GitHub Bot commented on DRILL-4411: --- GitHub user minji-kim opened a pull request: https://github.com/apache/drill/pull/381 DRILL-4411: hash join should limit batch based on size and number of records Right now, hash joins can run out of memory if records are large since the batch is limited only by size (of 4000). This patch implements a simple heuristic. If the allocator for the outputs become larger than 10 MB before outputing 4000 records (say 2000), then set the batch size limit to 2000 for the future batches. You can merge this pull request into a Git repository by running: $ git pull https://github.com/minji-kim/drill DRILL-4411 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/381.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #381 commit 2e3b1c75273e1b87679d79bdc4f3877b72603e3c Author: Minji Kim Date: 2016-02-18T17:05:51Z DRILL-4411: hash join should limit batch based on size as well as number of records > HashJoin should not only depend on number of records, but also on size > -- > > Key: DRILL-4411 > URL: https://issues.apache.org/jira/browse/DRILL-4411 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > In HashJoinProbeTemplate, each batch is limited to TARGET_RECORDS_PER_BATCH > (4000). But we should not only depend on the number of records, but also > size (in case of extremely large records). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4374) Drill rewrites Postgres query with ambiguous column references
[ https://issues.apache.org/jira/browse/DRILL-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152714#comment-15152714 ] Jacques Nadeau commented on DRILL-4374: --- Another potential similar example from mailing list: {code} SELECT cs.post_id, t.tag_id, cs.language_code, cs.likes_count, cs.comments_count, cs.clippings_count, cs.cr_recency_score FROM redshift.public.card_scores AS cs JOIN redshift.public.taggings AS t ON cs.post_id = t.post_id INNER JOIN redshift.public.min_scale_scores AS mss ON mss.post_id=cs.post_id WHERE cs.cr_recency_score IS NOT NULL AND t.status <> 'unpublished' Then, error raised. 2016-02-18 05:44:26,862 [293aa5c5-4dcd-3cd8-7b40-4847289d71fa:frag:0:0] INFO o.a.d.e.store.jdbc.JdbcRecordReader - User Error Occurred org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL query. sql SELECT * FROM (SELECT * FROM "public"."card_scores" INNER JOIN "public"."taggings" ON "card_scores"."post_id" = "taggings"."post_id" WHERE "card_scores"."cr_recency_score" IS NOT NULL AND "taggings"."status" <> 'unpublished') AS "t" INNER JOIN "public"."min_scale_scores" ON "t"."post_id" = "min_scale_scores"."post_id" plugin redshift [Error Id: 0ffb54f1-95b9-4a8b-b985-f05e16a2aa6a ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) ~[drill-common-1.5.0.jar:1.5.0] at org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup(JdbcRecordReader.java:221) [drill-jdbc-storage-1.5.0.jar:1.5.0] Caused by: org.postgresql.util.PSQLException: ERROR: column reference "post_id" is ambiguous {code} > Drill rewrites Postgres query with ambiguous column references > -- > > Key: DRILL-4374 > URL: https://issues.apache.org/jira/browse/DRILL-4374 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.4.0 >Reporter: Justin Bradford >Assignee: Taras Supyk > > Drill drops table references when rewriting this query, resulting in > ambiguous column references. > This query: > {code:sql} > select s.uuid as site_uuid, psc.partner_id, > sum(psc.net_revenue_dollars) as revenue > from app.public.partner_site_clicks psc > join app.public.sites s on psc.site_id = s.id > join app.public.partner_click_days pcd on pcd.id = psc.partner_click_day_id > where s.generate_revenue_report is true and pcd.`day` = '2016-02-07' > group by s.uuid, psc.partner_id; > {code} > Results in this error: > {quote} > DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL > query. > {quote} > Trying to run this re-written query: > {code:sql} > SELECT "site_uuid", "partner_id", SUM("net_revenue_dollars") AS "revenue" > FROM ( > SELECT "uuid" AS "site_uuid", "partner_id", "net_revenue_dollars" > FROM "public"."partner_site_clicks" > INNER JOIN "public"."sites" ON "partner_site_clicks"."site_id" = > "sites"."id" > INNER JOIN "public"."partner_click_days" ON > "partner_site_clicks"."partner_click_day_id" = "partner_click_days"."id" > WHERE "sites"."generate_revenue_report" IS TRUE AND > "partner_click_days"."day" = '2016-02-07' > ) AS "t0" GROUP BY "site_uuid", "partner_id" > {code} > That query fails due to an ambiguous "partner_id" reference as two of the > tables have that column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-3624) Enhance Web UI to be able to select schema ("use")
[ https://issues.apache.org/jira/browse/DRILL-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal closed DRILL-3624. -- Verified as part of Drill-3201 > Enhance Web UI to be able to select schema ("use") > -- > > Key: DRILL-3624 > URL: https://issues.apache.org/jira/browse/DRILL-3624 > Project: Apache Drill > Issue Type: Wish > Components: Client - HTTP >Affects Versions: 1.1.0 >Reporter: Uwe Geercken >Priority: Minor > Fix For: 1.5.0 > > > it would be advantageous to be able to select a schema ("use") in the web ui, > so that the information does not always have to be specified in each query. > this could be realized e.g. through a drop down where the user selects the > schema from the list of available schemas. the ui should store this > information until a different schema is selected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152772#comment-15152772 ] ASF GitHub Bot commented on DRILL-4410: --- Github user minji-kim commented on the pull request: https://github.com/apache/drill/pull/380#issuecomment-185846185 Added the checks in the test. > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152784#comment-15152784 ] ASF GitHub Bot commented on DRILL-4410: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/380#discussion_r53357932 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java --- @@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception { .go(); } + @Test // DRILL-4410 + // ListVector allocation + public void test_array() throws Exception{ + +long numRecords = 10; +String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + "arrays1.json"; --- End diff -- Will this work correctly on windows environment? > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (DRILL-4235) Hit IllegalStateException when exec.queue.enable=ture
[ https://issues.apache.org/jira/browse/DRILL-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim reopened DRILL-4235: - Assignee: Hanifi Gunes (was: Deneche A. Hakim) temporarily reopening this to assign it to the developer who fixed it > Hit IllegalStateException when exec.queue.enable=ture > -- > > Key: DRILL-4235 > URL: https://issues.apache.org/jira/browse/DRILL-4235 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.5.0 > Environment: git.commit.id=6dea429949a3d6a68aefbdb3d78de41e0955239b >Reporter: Dechang Gu >Assignee: Hanifi Gunes >Priority: Critical > Fix For: 1.5.0 > > > 0: jdbc:drill:schema=dfs.parquet> select * from sys.options; > Error: SYSTEM ERROR: IllegalStateException: Failure trying to change states: > ENQUEUED --> RUNNING > [Error Id: 6ac8167c-6fb7-4274-9e5c-bf62a195c06e on ucs-node5.perf.lab:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Exceptions caught during event processing > org.apache.drill.exec.work.foreman.Foreman.run():261 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (java.lang.RuntimeException) Exceptions caught during event > processing > org.apache.drill.common.EventProcessor.sendEvent():93 > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792 > org.apache.drill.exec.work.foreman.Foreman.moveToState():909 > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420 > org.apache.drill.exec.work.foreman.Foreman.runSQL():926 > org.apache.drill.exec.work.foreman.Foreman.run():250 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (java.lang.IllegalStateException) Failure trying to change > states: ENQUEUED --> RUNNING > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():896 > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():790 > org.apache.drill.common.EventProcessor.sendEvent():73 > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792 > org.apache.drill.exec.work.foreman.Foreman.moveToState():909 > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420 > org.apache.drill.exec.work.foreman.Foreman.runSQL():926 > org.apache.drill.exec.work.foreman.Foreman.run():250 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4235) Hit IllegalStateException when exec.queue.enable=ture
[ https://issues.apache.org/jira/browse/DRILL-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim closed DRILL-4235. --- Resolution: Fixed > Hit IllegalStateException when exec.queue.enable=ture > -- > > Key: DRILL-4235 > URL: https://issues.apache.org/jira/browse/DRILL-4235 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.5.0 > Environment: git.commit.id=6dea429949a3d6a68aefbdb3d78de41e0955239b >Reporter: Dechang Gu >Assignee: Hanifi Gunes >Priority: Critical > Fix For: 1.5.0 > > > 0: jdbc:drill:schema=dfs.parquet> select * from sys.options; > Error: SYSTEM ERROR: IllegalStateException: Failure trying to change states: > ENQUEUED --> RUNNING > [Error Id: 6ac8167c-6fb7-4274-9e5c-bf62a195c06e on ucs-node5.perf.lab:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: Exceptions caught during event processing > org.apache.drill.exec.work.foreman.Foreman.run():261 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (java.lang.RuntimeException) Exceptions caught during event > processing > org.apache.drill.common.EventProcessor.sendEvent():93 > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792 > org.apache.drill.exec.work.foreman.Foreman.moveToState():909 > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420 > org.apache.drill.exec.work.foreman.Foreman.runSQL():926 > org.apache.drill.exec.work.foreman.Foreman.run():250 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (java.lang.IllegalStateException) Failure trying to change > states: ENQUEUED --> RUNNING > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():896 > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():790 > org.apache.drill.common.EventProcessor.sendEvent():73 > org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792 > org.apache.drill.exec.work.foreman.Foreman.moveToState():909 > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420 > org.apache.drill.exec.work.foreman.Foreman.runSQL():926 > org.apache.drill.exec.work.foreman.Foreman.run():250 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152811#comment-15152811 ] ASF GitHub Bot commented on DRILL-4410: --- Github user minji-kim commented on a diff in the pull request: https://github.com/apache/drill/pull/380#discussion_r53359912 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java --- @@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception { .go(); } + @Test // DRILL-4410 + // ListVector allocation + public void test_array() throws Exception{ + +long numRecords = 10; +String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + "arrays1.json"; --- End diff -- ParquetRecordReaderTest also uses "/tmp", so I think this should also work. > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4257) Ensure shutting down a Drillbit also shuts down all StoragePlugins
[ https://issues.apache.org/jira/browse/DRILL-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152815#comment-15152815 ] Deneche A. Hakim commented on DRILL-4257: - The only storage plugins that are currently specifically closing resources are the Mongo and Kudu storage plugins > Ensure shutting down a Drillbit also shuts down all StoragePlugins > -- > > Key: DRILL-4257 > URL: https://issues.apache.org/jira/browse/DRILL-4257 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.5.0 > > > Right now, if a StoragePlugin implementation relies on the close method to > clean up resources, those resources won't be cleaned up when the Drillbit > class is shutdown. This is because Drillbit doesn't actually close the > StoragePluginRegistry and associated resources. This causes problems in > leaking resources in tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152820#comment-15152820 ] ASF GitHub Bot commented on DRILL-4387: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53360482 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/GroupScan.java --- @@ -35,6 +35,8 @@ public interface GroupScan extends Scan, HasAffinity{ public static final List ALL_COLUMNS = ImmutableList.of(SchemaPath.getSimplePath("*")); + public static final List EMPTY_COLUMNS = ImmutableList.of(); --- End diff -- Nice catch. It's no longer needed. (Originally, I intended to convert NULL to empty_columns. But not it's not necessary). I'll remove that. thx. > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152837#comment-15152837 ] ASF GitHub Bot commented on DRILL-4387: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53361870 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java --- @@ -87,9 +87,6 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS newColumns.add(column); } } - if (newColumns.isEmpty()) { --- End diff -- I went through all the ScanBatchCreator in Drill's code base. Seems ParquetScanBatchCreator is the only one that is converting an empty column list to ALL_COLUMNS. Looking at the history, seems DRILL-1845 added the code, probably just to make it work in parquet for skipAll query. With the patch of DRILL-4279, parquet record reader would be able to handle empty column list. Besides ParquetScanBatchCreator, this patch also modifies HBaseGroupScan, EasyGroupScan where it originally interprets empty column lists into ALL_COLUMNS. I'll add some comment to the code to clarify the different meaning of NULL and empty column list. > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152841#comment-15152841 ] ASF GitHub Bot commented on DRILL-4410: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/380#discussion_r53362337 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java --- @@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception { .go(); } + @Test // DRILL-4410 + // ListVector allocation + public void test_array() throws Exception{ + +long numRecords = 10; +String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + "arrays1.json"; --- End diff -- Seems ParquetRecordReaderTest is ignored? [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java#L84 > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4161) Make Hive Metastore client caching user configurable.
[ https://issues.apache.org/jira/browse/DRILL-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152844#comment-15152844 ] Rahul Challapalli commented on DRILL-4161: -- Verified! (Not automated as the test framework does not support this use case without hacking it) Created 2 hive plugins with the below details and verified that the metastore cache is properly updated based on the below settings. hive1 : {code} "hive.metastore.cache-ttl-seconds": "600", "hive.metastore.cache-expire-after": "access" {code} hive2: {code} "hive.metastore.cache-ttl-seconds": "30", "hive.metastore.cache-expire-after": "write" {code} > Make Hive Metastore client caching user configurable. > - > > Key: DRILL-4161 > URL: https://issues.apache.org/jira/browse/DRILL-4161 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni > Labels: documentation > Fix For: 1.5.0 > > > Drill leverages LoadingCache in hive metastore client, in order to avoid the > long access time to hive metastore server. However, there is a tradeoff > between caching stale data and the possibility of cache hit. > For instance, DRILL-3893 changes cache invalidation policy to "1 minute after > last write", to avoid the chances of hitting stale data. However, it also > implies that the cached data would be only valid for 1 minute after > loading/write. > It's desirable to allow user to configure the caching policy, per their > individual use case requirement. In particular, we probably should allow user > to specify: > 1) caching invalidation policy : expire after last access, or expire after > last write. > 2) cache TTL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4328) Fix for backward compatibility regression caused by DRILL-4198
[ https://issues.apache.org/jira/browse/DRILL-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152849#comment-15152849 ] Rahul Challapalli commented on DRILL-4328: -- There is no simple functional test that can verify this (unless we create something which consumes the interface that is being changed/reverted). I believe a unit test is sufficient for this. > Fix for backward compatibility regression caused by DRILL-4198 > -- > > Key: DRILL-4328 > URL: https://issues.apache.org/jira/browse/DRILL-4328 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Reporter: Venki Korukanti >Assignee: Venki Korukanti > Fix For: 1.5.0 > > > Revert updates made to StoragePlugin interface in DRILL-4198. Instead add the > new methods to AbstractStoragePlugin. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152884#comment-15152884 ] ASF GitHub Bot commented on DRILL-4410: --- Github user adeneche commented on a diff in the pull request: https://github.com/apache/drill/pull/380#discussion_r53365438 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java --- @@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception { .go(); } + @Test // DRILL-4410 + // ListVector allocation + public void test_array() throws Exception{ + +long numRecords = 10; +String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + "arrays1.json"; --- End diff -- an alternative is to use BaseTestQuery.getTempDir("ComplexTypeWriter") > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152890#comment-15152890 ] ASF GitHub Bot commented on DRILL-4410: --- Github user minji-kim commented on a diff in the pull request: https://github.com/apache/drill/pull/380#discussion_r53366006 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java --- @@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception { .go(); } + @Test // DRILL-4410 + // ListVector allocation + public void test_array() throws Exception{ + +long numRecords = 10; +String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + "arrays1.json"; --- End diff -- I think these tests all use /tmp. https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/impersonation/TestImpersonationMetadata.java#L64 https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/TestDropTable.java#L166 https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestWriter.java#L60 > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152896#comment-15152896 ] ASF GitHub Bot commented on DRILL-4387: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53366236 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java --- @@ -87,9 +87,6 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS newColumns.add(column); } } - if (newColumns.isEmpty()) { --- End diff -- @amansinha100 , I made slightly change to the patch to address the comments. Could you please take another look? Thanks! > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152907#comment-15152907 ] ASF GitHub Bot commented on DRILL-4410: --- Github user vkorukanti commented on a diff in the pull request: https://github.com/apache/drill/pull/380#discussion_r53366855 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java --- @@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception { .go(); } + @Test // DRILL-4410 + // ListVector allocation + public void test_array() throws Exception{ + +long numRecords = 10; +String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + "arrays1.json"; --- End diff -- TestImpersonationMetadata refers to a directory on HDFS which should be fine as the paths are unix style. > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4414) Generate test data for TestWindowFrame on the fly
Deneche A. Hakim created DRILL-4414: --- Summary: Generate test data for TestWindowFrame on the fly Key: DRILL-4414 URL: https://issues.apache.org/jira/browse/DRILL-4414 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Affects Versions: 1.2.0 Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Priority: Minor Fix For: Future Currently, for TestWindowFrame the test data and corresponding expected results are part of the source code. Those files are being generated by a tool that is also part of the source code (GenerateTestData). We should update the code to generate those files on the fly this way we won't have to track those files and add them whenever we add a new test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4412) Have an array of DrillBitEndPoints (at least) for leaf fragments instead of single one
[ https://issues.apache.org/jira/browse/DRILL-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuliya Feldman updated DRILL-4412: -- Description: To follow up on the ability to submit simple physical plan directly to a DrillBit for execution [DRILL-4132|https://issues.apache.org/jira/browse/DRILL-4132] it would be beneficial to have an array of DrillBitEndPoint in PlanFragment. Leaf fragments that scan the data can have an array of DrillBitEndPoint based on data locality, as data may be replicated and in case it is necessary to restart Scan fragment it can be restarted on DrillBits that have replica of the data, versus always retrying the same DrillBit. (was: To follow up on the ability to submit simple physical plan directly to a DrillBit for execution [JIRA-4132|https://issues.apache.org/jira/browse/DRILL-4132] it would be beneficial to have an array of DrillBitEndPoint in PlanFragment. Leaf fragments that scan the data can have an array of DrillBitEndPoint based on data locality, as data may be replicated and in case it is necessary to restart Scan fragment it can be restarted on DrillBits that have replica of the data, versus always retrying the same DrillBit.) > Have an array of DrillBitEndPoints (at least) for leaf fragments instead of > single one > -- > > Key: DRILL-4412 > URL: https://issues.apache.org/jira/browse/DRILL-4412 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > > To follow up on the ability to submit simple physical plan directly to a > DrillBit for execution > [DRILL-4132|https://issues.apache.org/jira/browse/DRILL-4132] it would be > beneficial to have an array of DrillBitEndPoint in PlanFragment. Leaf > fragments that scan the data can have an array of DrillBitEndPoint based on > data locality, as data may be replicated and in case it is necessary to > restart Scan fragment it can be restarted on DrillBits that have replica of > the data, versus always retrying the same DrillBit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4281) Drill should support inbound impersonation
[ https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4281: - Reviewer: Chun Chang > Drill should support inbound impersonation > -- > > Key: DRILL-4281 > URL: https://issues.apache.org/jira/browse/DRILL-4281 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: security > > Today Drill supports impersonation *to* external sources. For example I can > authenticate to Drill as myself and then Drill will access HDFS using > impersonation > In many scenarios we also need impersonation to Drill. For example I might > use some front end tool (such as Tableau) and authenticate to it as myself. > That tool (server version) then needs to access Drill to perform queries and > I want those queries to run as myself, not as the Tableau user. While in > theory the intermediate tool could store the userid & password for every user > to the Drill this isn't a scalable or very secure solution. > Note that HS2 today does support inbound impersonation as described here: > https://issues.apache.org/jira/browse/HIVE-5155 > The above is not the best approach as it is tied to the connection object > which is very coarse grained and potentially expensive. It would be better if > there was a call on the ODBC/JDBC driver to switch the identity on a existing > connection. Most modern SQL databases (Oracle, DB2) support such function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files
[ https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152954#comment-15152954 ] Deneche A. Hakim commented on DRILL-4392: - [~sphillips] any ETA when this could be fixed ? > CTAS with partition writes an internal field into generated parquet files > - > > Key: DRILL-4392 > URL: https://issues.apache.org/jira/browse/DRILL-4392 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Steven Phillips >Priority: Blocker > > On today's master branch: > {code} > select * from sys.version; > +-+---+-++-++ > | version | commit_id | > commit_message|commit_time > | build_email | build_time | > +-+---+-++-++ > | 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382: > Remove dependency on drill-logical from vector package | 16.02.2016 @ > 11:58:48 PST | j...@apache.org | 16.02.2016 @ 17:40:44 PST | > +-+---+-++- > {code} > Parquet table created by Drill's CTAS statement has one internal field > "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not > impact non-star query, but would cause incorrect result for star query. > {code} > use dfs.tmp; > create table nation_ctas partition by (n_regionkey) as select * from > cp.`tpch/nation.parquet`; > select * from dfs.tmp.nation_ctas limit 6; > +--++--+-++ > | n_nationkey | n_name | n_regionkey | > n_comment > | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R | > +--++--+-++ > | 5| ETHIOPIA | 0| ven packages wake quickly. > regu >| true | > | 15 | MOROCCO| 0| rns. blithely bold courts > among the closely regular packages use furiously bold platelets? > | false | > | 14 | KENYA | 0| pending excuses haggle > furiously deposits. pending, express pinto beans wake fluffily past t > | false | > | 0| ALGERIA| 0| haggle. carefully final > deposits detect slyly agai > | false | > | 16 | MOZAMBIQUE | 0| s. ironic, unusual > asymptotes wake blithely r >| false | > | 24 | UNITED STATES | 1| y final packages. slow foxes > cajole quickly. quickly silent platelets breach ironic accounts. unusual > pinto be | true > {code} > This basically breaks all the parquet files created by Drill's CTAS with > partition support. > Also, it will also fail one of the Pre-commit functional test [1] > [1] > https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table
[ https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153147#comment-15153147 ] ASF GitHub Bot commented on DRILL-3688: --- GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/382 DRILL-3688: Drill should honor "skip.header.line.count" and "skip.foo… Drill should honor "skip.header.line.count" and "skip.footer.line.count" attribute of Hive table: 1. Functionality to skip header and footer lines while reading Hive data. 2. Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-3688 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/382.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #382 commit 0482cea2c771b167907e32258fb649979cddcf49 Author: Arina Ielchiieva Date: 2016-02-11T17:16:30Z DRILL-3688: Drill should honor "skip.header.line.count" and "skip.footer.line.count" attribute of Hive table 1. Functionality to skip header and footer lines while reading Hive data. 2. Unit tests. > Drill should honor "skip.header.line.count" and "skip.footer.line.count" > attributes of Hive table > - > > Key: DRILL-3688 > URL: https://issues.apache.org/jira/browse/DRILL-3688 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: 1.1 >Reporter: Hao Zhu >Assignee: Arina Ielchiieva > Fix For: Future > > > Currently Drill does not honor the "skip.header.line.count" attribute of Hive > table. > It may cause some other format conversion issue. > Reproduce: > 1. Create a Hive table > {code} > create table h1db.testheader(col0 string) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > STORED AS TEXTFILE > tblproperties("skip.header.line.count"="1"); > {code} > 2. Prepare a sample data: > {code} > # cat test.data > col0 > 2015-01-01 > {code} > 3. Load sample data into Hive > {code} > LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader; > {code} > 4. Hive > {code} > hive> select * from h1db.testheader ; > OK > 2015-01-01 > Time taken: 0.254 seconds, Fetched: 1 row(s) > {code} > 5. Drill > {code} > > select * from hive.h1db.testheader ; > +-+ > |col0 | > +-+ > | col0| > | 2015-01-01 | > +-+ > 2 rows selected (0.257 seconds) > > select cast(col0 as date) from hive.h1db.testheader ; > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must > be in the range [1,12] > Fragment 0:0 > [Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010] > (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be > in the range [1,12] > org.joda.time.field.FieldUtils.verifyValueBounds():236 > org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613 > org.joda.time.chrono.BasicChronology.getDateTimeMillis():159 > org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261 > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218 > org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67 > org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():147 > org.apache.drill.exec.physical.impl.BaseRootExec.next():83 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79 > org.apache.drill.exec.physical.impl.BaseRootExec.next():73 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1566 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():255 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > {code} > Also "skip.footer.line.count" should be taken into account. > If "
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153156#comment-15153156 ] ASF GitHub Bot commented on DRILL-4387: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53386823 --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseGroupScan.java --- @@ -34,6 +34,7 @@ import java.util.concurrent.TimeUnit; import com.fasterxml.jackson.annotation.JsonCreator; +import com.google.common.base.Objects; --- End diff -- unnecessary import ? > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153159#comment-15153159 ] ASF GitHub Bot commented on DRILL-4387: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/379#issuecomment-185934019 LGTM +1. > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query
[ https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153168#comment-15153168 ] ASF GitHub Bot commented on DRILL-4387: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/379#discussion_r53387614 --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseGroupScan.java --- @@ -34,6 +34,7 @@ import java.util.concurrent.TimeUnit; import com.fasterxml.jackson.annotation.JsonCreator; +import com.google.common.base.Objects; --- End diff -- right. I'll remove these unused imports. Thanks. > Improve execution side when it handles skipAll query > > > Key: DRILL-4387 > URL: https://issues.apache.org/jira/browse/DRILL-4387 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.6.0 > > > DRILL-4279 changes the planner side and the RecordReader in the execution > side when they handles skipAll query. However, it seems there are other > places in the codebase that do not handle skipAll query efficiently. In > particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty > column list with star column. This essentially will force the execution side > (RecordReader) to fetch all the columns for data source. Such behavior will > lead to big performance overhead for the SCAN operator. > To improve Drill's performance, we should change those places as well, as a > follow-up work after DRILL-4279. > One simple example of this problem is: > {code} >SELECT DISTINCT substring(dir1, 5) from dfs.`/Path/To/ParquetTable`; > {code} > The query does not require any regular column from the parquet file. However, > ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the > column list. In case table has dozens or hundreds of columns, this will make > SCAN operator much more expensive than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-3869) Trailing semicolon causes web UI to fail
[ https://issues.apache.org/jira/browse/DRILL-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal closed DRILL-3869. -- Verified that queries that end with a ";" run successfully from web UI. > Trailing semicolon causes web UI to fail > > > Key: DRILL-3869 > URL: https://issues.apache.org/jira/browse/DRILL-3869 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Andrew > Fix For: 1.5.0 > > > When submitting a query through the web UI, if the user types in a trailing > ';' the query will fail with the error message: > org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: > Encountered ";" at line 1, column 42. Was expecting one of: "OFFSET" ... > "FETCH" ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak
[ https://issues.apache.org/jira/browse/DRILL-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli closed DRILL-4353. Verified! > Expired sessions in web server are not cleaning up resources, leading to > resource leak > -- > > Key: DRILL-4353 > URL: https://issues.apache.org/jira/browse/DRILL-4353 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP, Web Server >Affects Versions: 1.5.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti >Priority: Blocker > Fix For: 1.5.0 > > > Currently we store the session resources (including DrillClient) in attribute > {{SessionAuthentication}} object which implements > {{HttpSessionBindingListener}}. Whenever a session is invalidated, all > attributes are removed and if an attribute class implements > {{HttpSessionBindingListener}}, listener is informed. > {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} > logs out the user which includes cleaning up the resources as well, but > {{SessionAuthentication}} relies on ServletContext stored in thread local > variable (see > [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]). > In case of thread that cleans up the expired sessions there is no > {{ServletContext}} in thread local variable, leading to not logging out the > user properly and resource leak. > Fix: Add {{HttpSessionEventListener}} to cleanup the > {{SessionAuthentication}} and resources every time a HttpSession is expired > or invalidated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak
[ https://issues.apache.org/jira/browse/DRILL-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli updated DRILL-4353: - Reviewer: Rahul Challapalli (was: Krystal) > Expired sessions in web server are not cleaning up resources, leading to > resource leak > -- > > Key: DRILL-4353 > URL: https://issues.apache.org/jira/browse/DRILL-4353 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP, Web Server >Affects Versions: 1.5.0 >Reporter: Venki Korukanti >Assignee: Venki Korukanti >Priority: Blocker > Fix For: 1.5.0 > > > Currently we store the session resources (including DrillClient) in attribute > {{SessionAuthentication}} object which implements > {{HttpSessionBindingListener}}. Whenever a session is invalidated, all > attributes are removed and if an attribute class implements > {{HttpSessionBindingListener}}, listener is informed. > {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} > logs out the user which includes cleaning up the resources as well, but > {{SessionAuthentication}} relies on ServletContext stored in thread local > variable (see > [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]). > In case of thread that cleans up the expired sessions there is no > {{ServletContext}} in thread local variable, leading to not logging out the > user properly and resource leak. > Fix: Add {{HttpSessionEventListener}} to cleanup the > {{SessionAuthentication}} and resources every time a HttpSession is expired > or invalidated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files
[ https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153419#comment-15153419 ] ASF GitHub Bot commented on DRILL-4392: --- GitHub user jinfengni opened a pull request: https://github.com/apache/drill/pull/383 DRILL-4392: Fix CTAS partition to remove one unnecessary internal fie… …ld in generated parquet files. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinfengni/incubator-drill DRILL-4392 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/383.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #383 commit bc6427685b9b3a7846acbc177cfe6e7e1163ec6e Author: Jinfeng Ni Date: 2016-02-18T23:38:42Z DRILL-4392: Fix CTAS partition to remove one unnecessary internal field in generated parquet files. > CTAS with partition writes an internal field into generated parquet files > - > > Key: DRILL-4392 > URL: https://issues.apache.org/jira/browse/DRILL-4392 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Steven Phillips >Priority: Blocker > > On today's master branch: > {code} > select * from sys.version; > +-+---+-++-++ > | version | commit_id | > commit_message|commit_time > | build_email | build_time | > +-+---+-++-++ > | 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382: > Remove dependency on drill-logical from vector package | 16.02.2016 @ > 11:58:48 PST | j...@apache.org | 16.02.2016 @ 17:40:44 PST | > +-+---+-++- > {code} > Parquet table created by Drill's CTAS statement has one internal field > "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not > impact non-star query, but would cause incorrect result for star query. > {code} > use dfs.tmp; > create table nation_ctas partition by (n_regionkey) as select * from > cp.`tpch/nation.parquet`; > select * from dfs.tmp.nation_ctas limit 6; > +--++--+-++ > | n_nationkey | n_name | n_regionkey | > n_comment > | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R | > +--++--+-++ > | 5| ETHIOPIA | 0| ven packages wake quickly. > regu >| true | > | 15 | MOROCCO| 0| rns. blithely bold courts > among the closely regular packages use furiously bold platelets? > | false | > | 14 | KENYA | 0| pending excuses haggle > furiously deposits. pending, express pinto beans wake fluffily past t > | false | > | 0| ALGERIA| 0| haggle. carefully final > deposits detect slyly agai > | false | > | 16 | MOZAMBIQUE | 0| s. ironic, unusual > asymptotes wake blithely r >| false | > | 24 | UNITED STATES | 1| y final packages. slow foxes > cajole quickly. quickly silent platelets breach ironic accounts. unusual > pinto be | true > {code} > This basically breaks all the parquet files created by Drill's CTAS with > partition support. >
[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files
[ https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153435#comment-15153435 ] Jinfeng Ni commented on DRILL-4392: --- I submitted a patch for this issue. Seems the issue was caused by the change of MaterializedField.getPath() returning a String in stead of SchemaPath. That makes the check for the internal partition-related field fail, since one uses String, while the other uses SchemaPath. The fix is simply to compare two Strings. On a side note, I'm not sure if it is a right way to change MaterializedField.getPath() to return String, in stead of SchemaPath. Returning String means we have to ensure the case sensitivity in comparison is consistent across the code base, which seems harder to enforce. > CTAS with partition writes an internal field into generated parquet files > - > > Key: DRILL-4392 > URL: https://issues.apache.org/jira/browse/DRILL-4392 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Steven Phillips >Priority: Blocker > > On today's master branch: > {code} > select * from sys.version; > +-+---+-++-++ > | version | commit_id | > commit_message|commit_time > | build_email | build_time | > +-+---+-++-++ > | 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382: > Remove dependency on drill-logical from vector package | 16.02.2016 @ > 11:58:48 PST | j...@apache.org | 16.02.2016 @ 17:40:44 PST | > +-+---+-++- > {code} > Parquet table created by Drill's CTAS statement has one internal field > "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not > impact non-star query, but would cause incorrect result for star query. > {code} > use dfs.tmp; > create table nation_ctas partition by (n_regionkey) as select * from > cp.`tpch/nation.parquet`; > select * from dfs.tmp.nation_ctas limit 6; > +--++--+-++ > | n_nationkey | n_name | n_regionkey | > n_comment > | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R | > +--++--+-++ > | 5| ETHIOPIA | 0| ven packages wake quickly. > regu >| true | > | 15 | MOROCCO| 0| rns. blithely bold courts > among the closely regular packages use furiously bold platelets? > | false | > | 14 | KENYA | 0| pending excuses haggle > furiously deposits. pending, express pinto beans wake fluffily past t > | false | > | 0| ALGERIA| 0| haggle. carefully final > deposits detect slyly agai > | false | > | 16 | MOZAMBIQUE | 0| s. ironic, unusual > asymptotes wake blithely r >| false | > | 24 | UNITED STATES | 1| y final packages. slow foxes > cajole quickly. quickly silent platelets breach ironic accounts. unusual > pinto be | true > {code} > This basically breaks all the parquet files created by Drill's CTAS with > partition support. > Also, it will also fail one of the Pre-commit functional test [1] > [1] > https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q -- This message wa
[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException
[ https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153584#comment-15153584 ] ASF GitHub Bot commented on DRILL-4410: --- Github user minji-kim commented on the pull request: https://github.com/apache/drill/pull/380#issuecomment-186020856 I made a change in the test to use the temporary directory, since that seems to be questionable. Also I added another test in TestValueVector. Both tests fail without the change in ListVector. > ListVector causes OversizedAllocationException > -- > > Key: DRILL-4410 > URL: https://issues.apache.org/jira/browse/DRILL-4410 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: MinJi Kim >Assignee: MinJi Kim > > Reading large data set with array/list causes the following problem. This > happens when union type is enabled. > (org.apache.drill.exec.exception.OversizedAllocationException) Unable to > expand the buffer. Max allowed buffer size is reached. > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406 > org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298 > org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307 > org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115 > org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100 > org.apache.drill.exec.vector.complex.ListVector.copyFrom():97 > org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173 > org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():251 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files
[ https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153735#comment-15153735 ] ASF GitHub Bot commented on DRILL-4392: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/383#issuecomment-186053195 Fix looks fine. +1 However I think we should open a separate bug that this should be fixed in planning when we add the partition column we should the projection to remove this. > CTAS with partition writes an internal field into generated parquet files > - > > Key: DRILL-4392 > URL: https://issues.apache.org/jira/browse/DRILL-4392 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Steven Phillips >Priority: Blocker > > On today's master branch: > {code} > select * from sys.version; > +-+---+-++-++ > | version | commit_id | > commit_message|commit_time > | build_email | build_time | > +-+---+-++-++ > | 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382: > Remove dependency on drill-logical from vector package | 16.02.2016 @ > 11:58:48 PST | j...@apache.org | 16.02.2016 @ 17:40:44 PST | > +-+---+-++- > {code} > Parquet table created by Drill's CTAS statement has one internal field > "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not > impact non-star query, but would cause incorrect result for star query. > {code} > use dfs.tmp; > create table nation_ctas partition by (n_regionkey) as select * from > cp.`tpch/nation.parquet`; > select * from dfs.tmp.nation_ctas limit 6; > +--++--+-++ > | n_nationkey | n_name | n_regionkey | > n_comment > | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R | > +--++--+-++ > | 5| ETHIOPIA | 0| ven packages wake quickly. > regu >| true | > | 15 | MOROCCO| 0| rns. blithely bold courts > among the closely regular packages use furiously bold platelets? > | false | > | 14 | KENYA | 0| pending excuses haggle > furiously deposits. pending, express pinto beans wake fluffily past t > | false | > | 0| ALGERIA| 0| haggle. carefully final > deposits detect slyly agai > | false | > | 16 | MOZAMBIQUE | 0| s. ironic, unusual > asymptotes wake blithely r >| false | > | 24 | UNITED STATES | 1| y final packages. slow foxes > cajole quickly. quickly silent platelets breach ironic accounts. unusual > pinto be | true > {code} > This basically breaks all the parquet files created by Drill's CTAS with > partition support. > Also, it will also fail one of the Pre-commit functional test [1] > [1] > https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files
[ https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153739#comment-15153739 ] ASF GitHub Bot commented on DRILL-4392: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/383#issuecomment-186058545 Not projection removal but rather writer rewrite. > CTAS with partition writes an internal field into generated parquet files > - > > Key: DRILL-4392 > URL: https://issues.apache.org/jira/browse/DRILL-4392 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Steven Phillips >Priority: Blocker > > On today's master branch: > {code} > select * from sys.version; > +-+---+-++-++ > | version | commit_id | > commit_message|commit_time > | build_email | build_time | > +-+---+-++-++ > | 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382: > Remove dependency on drill-logical from vector package | 16.02.2016 @ > 11:58:48 PST | j...@apache.org | 16.02.2016 @ 17:40:44 PST | > +-+---+-++- > {code} > Parquet table created by Drill's CTAS statement has one internal field > "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not > impact non-star query, but would cause incorrect result for star query. > {code} > use dfs.tmp; > create table nation_ctas partition by (n_regionkey) as select * from > cp.`tpch/nation.parquet`; > select * from dfs.tmp.nation_ctas limit 6; > +--++--+-++ > | n_nationkey | n_name | n_regionkey | > n_comment > | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R | > +--++--+-++ > | 5| ETHIOPIA | 0| ven packages wake quickly. > regu >| true | > | 15 | MOROCCO| 0| rns. blithely bold courts > among the closely regular packages use furiously bold platelets? > | false | > | 14 | KENYA | 0| pending excuses haggle > furiously deposits. pending, express pinto beans wake fluffily past t > | false | > | 0| ALGERIA| 0| haggle. carefully final > deposits detect slyly agai > | false | > | 16 | MOZAMBIQUE | 0| s. ironic, unusual > asymptotes wake blithely r >| false | > | 24 | UNITED STATES | 1| y final packages. slow foxes > cajole quickly. quickly silent platelets breach ironic accounts. unusual > pinto be | true > {code} > This basically breaks all the parquet files created by Drill's CTAS with > partition support. > Also, it will also fail one of the Pre-commit functional test [1] > [1] > https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores
[ https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153741#comment-15153741 ] ASF GitHub Bot commented on DRILL-4275: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/374 > Refactor e/pstore interfaces and their factories to provide a unified > mechanism to access stores > > > Key: DRILL-4275 > URL: https://issues.apache.org/jira/browse/DRILL-4275 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Reporter: Hanifi Gunes >Assignee: Deneche A. Hakim > > We rely on E/PStore interfaces to persist data. Even though E/PStore stands > for Ephemeral and Persistent stores respectively, the current design for > EStore does not extend the interface/functionality of PStore at all, which > hints abstraction for EStore is redundant. This issue proposes a new unified > Store interface replacing the old E/PStore that exposes an additional method > that report persistence level as follows: > {code:title=Store interface} > interface Store { > StoreMode getMode(); > V get(String key); > ... > } > enum StoreMode { > EPHEMERAL, > PERSISTENT, > ... > } > {code} > The new design brings in less redundancy, more centralized code, ease to > reason and maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores
[ https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes reassigned DRILL-4275: --- Assignee: Hanifi Gunes (was: Deneche A. Hakim) > Refactor e/pstore interfaces and their factories to provide a unified > mechanism to access stores > > > Key: DRILL-4275 > URL: https://issues.apache.org/jira/browse/DRILL-4275 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Reporter: Hanifi Gunes >Assignee: Hanifi Gunes > > We rely on E/PStore interfaces to persist data. Even though E/PStore stands > for Ephemeral and Persistent stores respectively, the current design for > EStore does not extend the interface/functionality of PStore at all, which > hints abstraction for EStore is redundant. This issue proposes a new unified > Store interface replacing the old E/PStore that exposes an additional method > that report persistence level as follows: > {code:title=Store interface} > interface Store { > StoreMode getMode(); > V get(String key); > ... > } > enum StoreMode { > EPHEMERAL, > PERSISTENT, > ... > } > {code} > The new design brings in less redundancy, more centralized code, ease to > reason and maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores
[ https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanifi Gunes updated DRILL-4275: Issue Type: Sub-task (was: Improvement) Parent: DRILL-4186 > Refactor e/pstore interfaces and their factories to provide a unified > mechanism to access stores > > > Key: DRILL-4275 > URL: https://issues.apache.org/jira/browse/DRILL-4275 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow >Reporter: Hanifi Gunes >Assignee: Hanifi Gunes > > We rely on E/PStore interfaces to persist data. Even though E/PStore stands > for Ephemeral and Persistent stores respectively, the current design for > EStore does not extend the interface/functionality of PStore at all, which > hints abstraction for EStore is redundant. This issue proposes a new unified > Store interface replacing the old E/PStore that exposes an additional method > that report persistence level as follows: > {code:title=Store interface} > interface Store { > StoreMode getMode(); > V get(String key); > ... > } > enum StoreMode { > EPHEMERAL, > PERSISTENT, > ... > } > {code} > The new design brings in less redundancy, more centralized code, ease to > reason and maintain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)