[jira] [Commented] (DRILL-2223) Empty parquet file created with Limit 0 query errors out when querying
[ https://issues.apache.org/jira/browse/DRILL-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704613#comment-15704613 ] SAIKRISHNA commented on DRILL-2223: --- Is there any way to create empty parquet schema with zero records , for my business use case we need to create empty parquet with schema as it is creating in json with zero records. I am trying with below query getting zero records create table target.HIVE.employeeTest2911 AS SELECT * FROM cp.`employee.json` where employee_id >1157 Fragment Number of records written 0_0 0 when I try this select * from target.HIVE.employeeTest2911 getting this exception org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From line 1, column 16 to line 1, column 21: Table 'target.HIVE.employeeTest2911' not found SQL Query null [Error Id: 5ee67a9b-b3ec-4ac8-88bd-13d8428f1d48 on DataNode1:31010] Workspace structure is like this { "type": "file", "enabled": true, "connection": "hdfs://XXX:8020", "config": null, "workspaces": { "HIVE": { "location": "/user/tmp", "writable": true, "defaultInputFormat": null } }, "formats": { "parquet": { "type": "parquet" } } } Can I have solution for this,if any one has the solution to overcome this please let me know, Thanks in advance > Empty parquet file created with Limit 0 query errors out when querying > -- > > Key: DRILL-2223 > URL: https://issues.apache.org/jira/browse/DRILL-2223 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 0.7.0 >Reporter: Aman Sinha > Fix For: Future > > > Doing a CTAS with limit 0 creates a 0 length parquet file which errors out > during querying. This should at least write the schema information and > metadata which will allow queries to run. > {code} > 0: jdbc:drill:zk=local> create table tt_nation2 as select n_nationkey, > n_name, n_regionkey from cp.`tpch/nation.parquet` limit 0; > ++---+ > | Fragment | Number of records written | > ++---+ > | 0_0| 0 | > ++---+ > 1 row selected (0.315 seconds) > 0: jdbc:drill:zk=local> select n_nationkey from tt_nation2; > Query failed: RuntimeException: file:/tmp/tt_nation2/0_0_0.parquet is not a > Parquet file (too small) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-5078) use Custom Functions errors
[ https://issues.apache.org/jira/browse/DRILL-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva closed DRILL-5078. --- Resolution: Not A Bug > use Custom Functions errors > --- > > Key: DRILL-5078 > URL: https://issues.apache.org/jira/browse/DRILL-5078 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.8.0 > Environment: window 7 >Reporter: mircoteam >Priority: Trivial > > I define a function like change encoding from UTF8 to GBK。 > when I put it classes and source code into 3rdparty, and use it in query sql > like this : > "SELECT encode_translate(columns[0],'UTF-8','GBK') as aaa FROM > dfs.`d:/drill_test.csv` LIMIT 20" > it return a error info: > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > CompileException: Line 92, Column 42: Cannot determine simple type name > "UnsupportedEncodingException" Fragment 0:0 [Error Id: > 599d0e39-f05a-4ecd-a539-b5338239d63b on XXX..com:31010]。 > this is resource code of evel : > public void eval() { > // get the value and replace with > String stringValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start, > input.end, input.buffer); > String fromEncodeValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(fromEncode); > String toEncodeValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(toEncode); > try { > String toEncodeStringValue = new > String(stringValue.getBytes(fromEncodeValue),toEncodeValue); > out.buffer = buffer; > out.start = 0; > out.end = toEncodeStringValue.getBytes().length; > buffer.setBytes(0, toEncodeStringValue.getBytes()); > } catch (UnsupportedEncodingException e) { > } > } > please help me,thank your. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (DRILL-5078) use Custom Functions errors
[ https://issues.apache.org/jira/browse/DRILL-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reopened DRILL-5078: - > use Custom Functions errors > --- > > Key: DRILL-5078 > URL: https://issues.apache.org/jira/browse/DRILL-5078 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.8.0 > Environment: window 7 >Reporter: mircoteam >Priority: Trivial > > I define a function like change encoding from UTF8 to GBK。 > when I put it classes and source code into 3rdparty, and use it in query sql > like this : > "SELECT encode_translate(columns[0],'UTF-8','GBK') as aaa FROM > dfs.`d:/drill_test.csv` LIMIT 20" > it return a error info: > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > CompileException: Line 92, Column 42: Cannot determine simple type name > "UnsupportedEncodingException" Fragment 0:0 [Error Id: > 599d0e39-f05a-4ecd-a539-b5338239d63b on XXX..com:31010]。 > this is resource code of evel : > public void eval() { > // get the value and replace with > String stringValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start, > input.end, input.buffer); > String fromEncodeValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(fromEncode); > String toEncodeValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(toEncode); > try { > String toEncodeStringValue = new > String(stringValue.getBytes(fromEncodeValue),toEncodeValue); > out.buffer = buffer; > out.start = 0; > out.end = toEncodeStringValue.getBytes().length; > buffer.setBytes(0, toEncodeStringValue.getBytes()); > } catch (UnsupportedEncodingException e) { > } > } > please help me,thank your. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5078) use Custom Functions errors
[ https://issues.apache.org/jira/browse/DRILL-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704800#comment-15704800 ] Arina Ielchiieva commented on DRILL-5078: - [~zhenTan] you might wanna subscribe to Drill user / dev mailing lists (https://drill.apache.org/mailinglists/) and first post questions there when your are not sure if it's Drill bug or bug in your implementation, like in this case. > use Custom Functions errors > --- > > Key: DRILL-5078 > URL: https://issues.apache.org/jira/browse/DRILL-5078 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.8.0 > Environment: window 7 >Reporter: mircoteam >Priority: Trivial > > I define a function like change encoding from UTF8 to GBK。 > when I put it classes and source code into 3rdparty, and use it in query sql > like this : > "SELECT encode_translate(columns[0],'UTF-8','GBK') as aaa FROM > dfs.`d:/drill_test.csv` LIMIT 20" > it return a error info: > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > CompileException: Line 92, Column 42: Cannot determine simple type name > "UnsupportedEncodingException" Fragment 0:0 [Error Id: > 599d0e39-f05a-4ecd-a539-b5338239d63b on XXX..com:31010]。 > this is resource code of evel : > public void eval() { > // get the value and replace with > String stringValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start, > input.end, input.buffer); > String fromEncodeValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(fromEncode); > String toEncodeValue = > org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(toEncode); > try { > String toEncodeStringValue = new > String(stringValue.getBytes(fromEncodeValue),toEncodeValue); > out.buffer = buffer; > out.start = 0; > out.end = toEncodeStringValue.getBytes().length; > buffer.setBytes(0, toEncodeStringValue.getBytes()); > } catch (UnsupportedEncodingException e) { > } > } > please help me,thank your. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected
[ https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4604: Fix Version/s: (was: Future) 1.10.0 > Generate warning on Web UI if drillbits version mismatch is detected > > > Key: DRILL-4604 > URL: https://issues.apache.org/jira/browse/DRILL-4604 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > Attachments: NEW_matching_drillbits.JPG, > NEW_mismatching_drillbits.JPG, index_page.JPG, index_page_mismatch.JPG, > screenshots_with_different_states.docx > > > Display drillbit version on web UI. If any of drillbits version doesn't match > with current drillbit, generate warning. > Screenshots - screenshots_with_different_states.docx. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected
[ https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-4604: --- Assignee: Arina Ielchiieva (was: Paul Rogers) > Generate warning on Web UI if drillbits version mismatch is detected > > > Key: DRILL-4604 > URL: https://issues.apache.org/jira/browse/DRILL-4604 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > Attachments: NEW_matching_drillbits.JPG, > NEW_mismatching_drillbits.JPG, index_page.JPG, index_page_mismatch.JPG, > screenshots_with_different_states.docx > > > Display drillbit version on web UI. If any of drillbits version doesn't match > with current drillbit, generate warning. > Screenshots - screenshots_with_different_states.docx. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4604) Generate warning on Web UI if drillbits version mismatch is detected
[ https://issues.apache.org/jira/browse/DRILL-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4604: Labels: doc-impacting ready-to-commit (was: doc-impacting) > Generate warning on Web UI if drillbits version mismatch is detected > > > Key: DRILL-4604 > URL: https://issues.apache.org/jira/browse/DRILL-4604 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > Attachments: NEW_matching_drillbits.JPG, > NEW_mismatching_drillbits.JPG, index_page.JPG, index_page_mismatch.JPG, > screenshots_with_different_states.docx > > > Display drillbit version on web UI. If any of drillbits version doesn't match > with current drillbit, generate warning. > Screenshots - screenshots_with_different_states.docx. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered
[ https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705313#comment-15705313 ] ASF GitHub Bot commented on DRILL-5044: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/669#discussion_r89994657 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestDynamicUDFSupport.java --- @@ -271,6 +271,75 @@ public void testDuplicatedFunctionsInLocalRegistry() throws Exception { } @Test + public void testSuccessfulRegistrationAfterSeveralRetryAttempts() throws Exception { --- End diff -- Actually test `testSuccessfulRegistrationAfterSeveralRetryAttempts()` covers this case. To produce this issue we don't need several drillbits, it's just enough to issue several concurrent create function commands. As for this test case, I just mock method that updates remote function registry to throw VersionMismatchException two times and then call this method for real. > After the dynamic registration of multiple jars simultaneously not all UDFs > were registered > --- > > Key: DRILL-5044 > URL: https://issues.apache.org/jira/browse/DRILL-5044 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > > I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = > 30) and not all jars were registered. As I see in output, all function were > registered and /staging directory was empty, but not all of jars were moved > into /registry directory. > For example, after simultaneously registration I saw "The following UDFs in > jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but > this jar was not in /registry directory. When I tried to run function test1, > I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found > for function signature test1()". And when I tried to reregister > this jar, I got "Jar with test-1.1.jar name has been already registered". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered
[ https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705312#comment-15705312 ] ASF GitHub Bot commented on DRILL-5044: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/669#discussion_r89993422 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DropFunctionHandler.java --- @@ -143,7 +143,7 @@ private Jar unregister(String jarName, RemoteFunctionRegistry remoteFunctionRegi if (retryAttempts-- == 0) { throw new DrillRuntimeException("Failed to update remote function registry. Exceeded retry attempts limit."); } -unregister(jarName, remoteFunctionRegistry, retryAttempts); +return unregister(jarName, remoteFunctionRegistry, retryAttempts); --- End diff -- Agree. Done. > After the dynamic registration of multiple jars simultaneously not all UDFs > were registered > --- > > Key: DRILL-5044 > URL: https://issues.apache.org/jira/browse/DRILL-5044 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > > I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = > 30) and not all jars were registered. As I see in output, all function were > registered and /staging directory was empty, but not all of jars were moved > into /registry directory. > For example, after simultaneously registration I saw "The following UDFs in > jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but > this jar was not in /registry directory. When I tried to run function test1, > I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found > for function signature test1()". And when I tried to reregister > this jar, I got "Jar with test-1.1.jar name has been already registered". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered
[ https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705311#comment-15705311 ] ASF GitHub Bot commented on DRILL-5044: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/669#discussion_r89995323 --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestDynamicUDFSupport.java --- @@ -271,6 +271,75 @@ public void testDuplicatedFunctionsInLocalRegistry() throws Exception { } @Test + public void testSuccessfulRegistrationAfterSeveralRetryAttempts() throws Exception { +RemoteFunctionRegistry remoteFunctionRegistry = spyRemoteFunctionRegistry(); +copyDefaultJarsToStagingArea(); + +doThrow(new VersionMismatchException("Version mismatch detected", 1)) +.doThrow(new VersionMismatchException("Version mismatch detected", 1)) +.doCallRealMethod() + .when(remoteFunctionRegistry).updateRegistry(any(Registry.class), any(DataChangeVersion.class)); + +String summary = "The following UDFs in jar %s have been registered:\n" + +"[custom_lower(VARCHAR-REQUIRED)]"; + +testBuilder() +.sqlQuery("create function using jar '%s'", default_binary_name) +.unOrdered() +.baselineColumns("ok", "summary") +.baselineValues(true, String.format(summary, default_binary_name)) +.go(); + +verify(remoteFunctionRegistry, times(3)) +.updateRegistry(any(Registry.class), any(DataChangeVersion.class)); + +FileSystem fs = remoteFunctionRegistry.getFs(); + +assertFalse("Staging area should be empty", fs.listFiles(remoteFunctionRegistry.getStagingArea(), false).hasNext()); +assertFalse("Temporary area should be empty", fs.listFiles(remoteFunctionRegistry.getTmpArea(), false).hasNext()); + +assertTrue("Binary should be present in registry area", +fs.exists(new Path(remoteFunctionRegistry.getRegistryArea(), default_binary_name))); +assertTrue("Source should be present in registry area", +fs.exists(new Path(remoteFunctionRegistry.getRegistryArea(), default_source_name))); + +Registry registry = remoteFunctionRegistry.getRegistry(); +assertEquals("Registry should contain one jar", registry.getJarList().size(), 1); +assertEquals(registry.getJar(0).getName(), default_binary_name); + } + + @Test + public void testSuccessfulUnregistrationAfterSeveralRetryAttempts() throws Exception { +RemoteFunctionRegistry remoteFunctionRegistry = spyRemoteFunctionRegistry(); +copyDefaultJarsToStagingArea(); +test("create function using jar '%s'", default_binary_name); + +reset(remoteFunctionRegistry); +doThrow(new VersionMismatchException("Version mismatch detected", 1)) --- End diff -- It's Mockito functionality. You can mock the method to return failure or any result when it is being called. In this case we mock `updateRegistry()` method to return VersionMismatchException first two times. This way we simulate the situation that someone has updated remote function registry before us. After that we instruct to call real method. > After the dynamic registration of multiple jars simultaneously not all UDFs > were registered > --- > > Key: DRILL-5044 > URL: https://issues.apache.org/jira/browse/DRILL-5044 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > > I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = > 30) and not all jars were registered. As I see in output, all function were > registered and /staging directory was empty, but not all of jars were moved > into /registry directory. > For example, after simultaneously registration I saw "The following UDFs in > jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but > this jar was not in /registry directory. When I tried to run function test1, > I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found > for function signature test1()". And when I tried to reregister > this jar, I got "Jar with test-1.1.jar name has been already registered". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered
[ https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705310#comment-15705310 ] ASF GitHub Bot commented on DRILL-5044: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/669#discussion_r89994171 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateFunctionHandler.java --- @@ -175,22 +175,20 @@ private void initRemoteRegistration(List functions, List remoteJars = remoteRegistry.getRegistry(version).getJarList(); validateAgainstRemoteRegistry(remoteJars, jarManager.getBinaryName(), functions); jarManager.copyToRegistryArea(); -boolean cleanUp = true; List jars = Lists.newArrayList(remoteJars); jars.add(Jar.newBuilder().setName(jarManager.getBinaryName()).addAllFunctionSignature(functions).build()); Registry updatedRegistry = Registry.newBuilder().addAllJar(jars).build(); try { remoteRegistry.updateRegistry(updatedRegistry, version); - cleanUp = false; } catch (VersionMismatchException ex) { + jarManager.deleteQuietlyFromRegistryArea(); --- End diff -- 1. I guess having fixed number of retries is enough. Having retry and wait logic, may lead us to the point where user will have to wait for a long time till registration completes in case of busy system. With only retry logic we notify user pretty quickly that the system is busy and it's up to the user to decide when to try to register the function again, 2. Totally agree about recursion, since user may modify number of retry attempts, it's much better to have while loop to avoid stack overflow. > After the dynamic registration of multiple jars simultaneously not all UDFs > were registered > --- > > Key: DRILL-5044 > URL: https://issues.apache.org/jira/browse/DRILL-5044 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > > I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = > 30) and not all jars were registered. As I see in output, all function were > registered and /staging directory was empty, but not all of jars were moved > into /registry directory. > For example, after simultaneously registration I saw "The following UDFs in > jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but > this jar was not in /registry directory. When I tried to run function test1, > I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found > for function signature test1()". And when I tried to reregister > this jar, I got "Jar with test-1.1.jar name has been already registered". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered
[ https://issues.apache.org/jira/browse/DRILL-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705314#comment-15705314 ] ASF GitHub Bot commented on DRILL-5044: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/669#discussion_r89993367 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/CreateFunctionHandler.java --- @@ -175,22 +175,20 @@ private void initRemoteRegistration(List functions, List remoteJars = remoteRegistry.getRegistry(version).getJarList(); validateAgainstRemoteRegistry(remoteJars, jarManager.getBinaryName(), functions); jarManager.copyToRegistryArea(); -boolean cleanUp = true; List jars = Lists.newArrayList(remoteJars); jars.add(Jar.newBuilder().setName(jarManager.getBinaryName()).addAllFunctionSignature(functions).build()); Registry updatedRegistry = Registry.newBuilder().addAllJar(jars).build(); try { remoteRegistry.updateRegistry(updatedRegistry, version); - cleanUp = false; } catch (VersionMismatchException ex) { + jarManager.deleteQuietlyFromRegistryArea(); if (retryAttempts-- == 0) { throw new DrillRuntimeException("Failed to update remote function registry. Exceeded retry attempts limit."); } initRemoteRegistration(functions, jarManager, remoteRegistry, retryAttempts); -} finally { - if (cleanUp) { -jarManager.deleteQuietlyFromRegistryArea(); - } +} catch (Exception e) { --- End diff -- You are right. Updated the code, so now we delete jars only once on error. > After the dynamic registration of multiple jars simultaneously not all UDFs > were registered > --- > > Key: DRILL-5044 > URL: https://issues.apache.org/jira/browse/DRILL-5044 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.9.0 >Reporter: Roman >Assignee: Arina Ielchiieva > > I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = > 30) and not all jars were registered. As I see in output, all function were > registered and /staging directory was empty, but not all of jars were moved > into /registry directory. > For example, after simultaneously registration I saw "The following UDFs in > jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but > this jar was not in /registry directory. When I tried to run function test1, > I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found > for function signature test1()". And when I tried to reregister > this jar, I got "Jar with test-1.1.jar name has been already registered". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705810#comment-15705810 ] Aman Sinha commented on DRILL-4347: --- The jstack is long because of the complex query. It shows that the planner is stuck during Calcite's {bq} ReflectiveRelMetadataProvider.apply() {bq} call during the post-processing phase of Drill planning. At this phase, the logical and physical planning are done and planner is in SwapHashJoin phase. During this, it calls getRows() on the inputs of all the hash joins to makes its decisions. The getRows() eventually calls {bq}RelMdDistinctRowCount.getDistinctRowCount(){bq} since there is a GROUP-BY and the row count of a grouped aggregate is determined by the number of distinct rows for its group-by columns. Note that Calcite needs the distinct row count also from the Join operators (not just Aggregates) if the output of the Join is feeding into an Aggregate. It is unclear what is the root cause of the Calcite call either stuck or taking too long (there could be some issues with the deeply nested reflexive calls), but one important observation is that Drill is needlessly doing this computation twice - once during logical planning phase and once during physical planning. The distinct row count of all the Joins can be computed during logical planning and cached for future use during physical planning because this value is not going to change. For complex queries such as these with many joins, it also saves planning time. I am proposing to fix the issue by doing this caching of distinct row count for Joins. > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Aman Sinha > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_return
[jira] [Comment Edited] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705810#comment-15705810 ] Aman Sinha edited comment on DRILL-4347 at 11/29/16 4:48 PM: - The jstack is long because of the complex query. It shows that the planner is stuck during Calcite's {{ReflectiveRelMetadataProvider.apply()}} call during the post-processing phase of Drill planning. At this phase, the logical and physical planning are done and planner is in SwapHashJoin phase. During this, it calls getRows() on the inputs of all the hash joins to makes its decisions. The getRows() eventually calls {{RelMdDistinctRowCount.getDistinctRowCount()}} since there is a GROUP-BY and the row count of a grouped aggregate is determined by the number of distinct rows for its group-by columns. Note that Calcite needs the distinct row count also from the Join operators (not just Aggregates) if the output of the Join is feeding into an Aggregate. Note that the stack trace is different from a similar (but not same) issue reported in CALCITE-1053. It is unclear what is the root cause of the deeply nested reflexive calls getting stuck, but one important observation is that Drill is needlessly doing this computation twice - once during logical planning phase and once during physical planning. The distinct row count of all the Joins can be computed during logical planning and cached for future use during physical planning because this value is not going to change. For complex queries such as these with many joins, it also saves planning time. I am proposing to fix the issue by doing this caching of distinct row count for Joins. was (Author: amansinha100): The jstack is long because of the complex query. It shows that the planner is stuck during Calcite's {bq} ReflectiveRelMetadataProvider.apply() {bq} call during the post-processing phase of Drill planning. At this phase, the logical and physical planning are done and planner is in SwapHashJoin phase. During this, it calls getRows() on the inputs of all the hash joins to makes its decisions. The getRows() eventually calls {bq}RelMdDistinctRowCount.getDistinctRowCount(){bq} since there is a GROUP-BY and the row count of a grouped aggregate is determined by the number of distinct rows for its group-by columns. Note that Calcite needs the distinct row count also from the Join operators (not just Aggregates) if the output of the Join is feeding into an Aggregate. It is unclear what is the root cause of the Calcite call either stuck or taking too long (there could be some issues with the deeply nested reflexive calls), but one important observation is that Drill is needlessly doing this computation twice - once during logical planning phase and once during physical planning. The distinct row count of all the Joins can be computed during logical planning and cached for future use during physical planning because this value is not going to change. For complex queries such as these with many joins, it also saves planning time. I am proposing to fix the issue by doing this caching of distinct row count for Joins. > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Aman Sinha > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . .
[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705827#comment-15705827 ] ASF GitHub Bot commented on DRILL-4347: --- GitHub user amansinha100 opened a pull request: https://github.com/apache/drill/pull/671 DRILL-4347: Propagate distinct row count for joins from logical plann… …ing to physical planning phase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/amansinha100/incubator-drill DRILL-4347-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/671.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #671 commit f2036036b7e61bc100ab23da5003c32350327c08 Author: Aman Sinha Date: 2016-11-28T16:42:15Z DRILL-4347: Propagate distinct row count for joins from logical planning to physical planning phase. > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Aman Sinha > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_returns, > . . . . . . . . . . . . > cs_ui, > . . . . . . . . . . . . > date_dim d1, > . . . . . . . . . . . . > date_dim d2, > . . . . . . . . . . . . > date_dim d3, > . . . . . . . . . . . . > store, > . . . . . . . . . . . . > customer, > . . . . . . . . . . . . > customer_demographics cd1, > . . . . . . . . . . . . > customer_demographics cd2, > . . . . . . . . . . . . > promotion, > . . . . . . . . . . . . > household_demographics hd1, > . . . .
[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705857#comment-15705857 ] ASF GitHub Bot commented on DRILL-4347: --- Github user jacques-n commented on the issue: https://github.com/apache/drill/pull/671 Random question: won't a caching metadata provider achieve the same thing without having to do per rel node changes? At least that is what I thought it was for. Maybe @julianhyde can provide additional feedback? > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Aman Sinha > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_returns, > . . . . . . . . . . . . > cs_ui, > . . . . . . . . . . . . > date_dim d1, > . . . . . . . . . . . . > date_dim d2, > . . . . . . . . . . . . > date_dim d3, > . . . . . . . . . . . . > store, > . . . . . . . . . . . . > customer, > . . . . . . . . . . . . > customer_demographics cd1, > . . . . . . . . . . . . > customer_demographics cd2, > . . . . . . . . . . . . > promotion, > . . . . . . . . . . . . > household_demographics hd1, > . . . . . . . . . . . . > household_demographics hd2, > . . . . . . . . . . . . > customer_address ad1, > . . . . . . . . . . . . > customer_address ad2, > . . . . . . . . . . . . > income_band ib1, > . . . . . . . . . . . . > income_band ib2, > . . . . . . . . . . . . > item > . . . . . . . . . . . . > WHERE ss_store_sk = s_store_sk > . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk > . . . . . .
[jira] [Updated] (DRILL-5077) Memory Leak - when foreman Drillbit is stopped
[ https://issues.apache.org/jira/browse/DRILL-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua updated DRILL-5077: Reviewer: Paul Rogers > Memory Leak - when foreman Drillbit is stopped > -- > > Key: DRILL-5077 > URL: https://issues.apache.org/jira/browse/DRILL-5077 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.9.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Priority: Blocker > > terminating foreman drillbit while a query is running results in memory leak > Drill 1.9.0 git commit id: 4312d65b > Stack trace from drillbit.log > {noformat} > 2016-11-28 06:12:45,338 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 27c43522-79ca-989f-3659-d7ccbc77e2e7: select count(*) from `twoKeyJsn.json` > 2016-11-28 06:12:45,602 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, > numFiles: 1 > 2016-11-28 06:12:45,602 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, > numFiles: 1 > 2016-11-28 06:12:45,603 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, > numFiles: 1 > 2016-11-28 06:12:45,603 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, > numFiles: 1 > 2016-11-28 06:12:45,603 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, > numFiles: 1 > 2016-11-28 06:12:45,633 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, > numFiles: 1 > 2016-11-28 06:12:45,669 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 > using 1 threads. Time: 33ms total, 33.494123ms avg, 33ms max. > 2016-11-28 06:12:45,669 [27c43522-79ca-989f-3659-d7ccbc77e2e7:foreman] INFO > o.a.d.e.s.schedule.BlockMapBuilder - Get block maps: Executed 1 out of 1 > using 1 threads. Earliest start: 9.54 μs, Latest start: 9.54 μs, > Average start: 9.54 μs . > 2016-11-28 06:12:45,913 [27c43522-79ca-989f-3659-d7ccbc77e2e7:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 27c43522-79ca-989f-3659-d7ccbc77e2e7:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2016-11-28 06:12:45,913 [27c43522-79ca-989f-3659-d7ccbc77e2e7:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 27c43522-79ca-989f-3659-d7ccbc77e2e7:0:0: State to report: RUNNING > Mon Nov 28 06:12:48 UTC 2016 Terminating drillbit pid 28004 > 2016-11-28 06:12:48,697 [Drillbit-ShutdownHook#0] INFO > o.apache.drill.exec.server.Drillbit - Received shutdown request. > 2016-11-28 06:12:55,749 [pool-6-thread-2] INFO > o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup > io.netty.channel.nio.NioEventLoopGroup@15bcfacd in 1017 ms > 2016-11-28 06:12:55,750 [pool-6-thread-2] INFO > o.a.drill.exec.service.ServiceEngine - closed dataPool in 1018 ms > 2016-11-28 06:12:57,749 [Drillbit-ShutdownHook#0] WARN > o.apache.drill.exec.work.WorkManager - Closing WorkManager but there are 1 > running fragments. > 2016-11-28 06:12:57,751 [27c43522-79ca-989f-3659-d7ccbc77e2e7:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 27c43522-79ca-989f-3659-d7ccbc77e2e7:0:0: State change requested RUNNING --> > FAILED > 2016-11-28 06:12:57,751 [27c43522-79ca-989f-3659-d7ccbc77e2e7:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 27c43522-79ca-989f-3659-d7ccbc77e2e7:0:0: State change requested FAILED --> > FINISHED > 2016-11-28 06:12:57,756 [27c43522-79ca-989f-3659-d7ccbc77e2e7:frag:0:0] ERROR > o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException > Fragment 0:0 > [Error Id: 2df2f9a1-a7bf-4454-a31b-717ab4ebd815 on centos-01.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > NullPointerException > Fragment 0:0 > [Error Id: 2df2f9a1-a7bf-4454-a31b-717ab4ebd815 on centos-01.qa.lab:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293) > [drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) > [drill-java-exec-1.9.0.jar:1.9.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExec
[jira] [Updated] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )
[ https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5083: --- Reviewer: Sorabh Hamirwasia > IteratorValidator does not handle RecordIterator cleanup call to next( ) > > > Key: DRILL-5083 > URL: https://issues.apache.org/jira/browse/DRILL-5083 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Priority: Minor > > This one is very confusing... > In a test with a MergeJoin and external sort, operators are stacked something > like this: > {code} > Screen > - MergeJoin > - - External Sort > ... > {code} > Using the injector to force a OOM in spill, the external sort threw a > UserException up the stack. This was handed by: > {code} > IteratorValidatorBatchIterator.next( ) > RecordIterator.clearInflightBatches( ) > RecordIterator.close( ) > MergeJoinBatch.close( ) > {code} > Which does the following: > {code} > // Check whether next() should even have been called in current state. > if (null != exceptionState) { > throw new IllegalStateException( > {code} > But, the exceptionState is set, so we end up throwing an > IllegalStateException during cleanup. > Seems the code should agree: if {{next( )}} will be called during cleanup, > then {{next( )}} should gracefully handle that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )
[ https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5083: --- Reviewer: (was: Sorabh Hamirwasia) > IteratorValidator does not handle RecordIterator cleanup call to next( ) > > > Key: DRILL-5083 > URL: https://issues.apache.org/jira/browse/DRILL-5083 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Priority: Minor > > This one is very confusing... > In a test with a MergeJoin and external sort, operators are stacked something > like this: > {code} > Screen > - MergeJoin > - - External Sort > ... > {code} > Using the injector to force a OOM in spill, the external sort threw a > UserException up the stack. This was handed by: > {code} > IteratorValidatorBatchIterator.next( ) > RecordIterator.clearInflightBatches( ) > RecordIterator.close( ) > MergeJoinBatch.close( ) > {code} > Which does the following: > {code} > // Check whether next() should even have been called in current state. > if (null != exceptionState) { > throw new IllegalStateException( > {code} > But, the exceptionState is set, so we end up throwing an > IllegalStateException during cleanup. > Seems the code should agree: if {{next( )}} will be called during cleanup, > then {{next( )}} should gracefully handle that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )
[ https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers reassigned DRILL-5083: -- Assignee: Paul Rogers > IteratorValidator does not handle RecordIterator cleanup call to next( ) > > > Key: DRILL-5083 > URL: https://issues.apache.org/jira/browse/DRILL-5083 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > This one is very confusing... > In a test with a MergeJoin and external sort, operators are stacked something > like this: > {code} > Screen > - MergeJoin > - - External Sort > ... > {code} > Using the injector to force a OOM in spill, the external sort threw a > UserException up the stack. This was handed by: > {code} > IteratorValidatorBatchIterator.next( ) > RecordIterator.clearInflightBatches( ) > RecordIterator.close( ) > MergeJoinBatch.close( ) > {code} > Which does the following: > {code} > // Check whether next() should even have been called in current state. > if (null != exceptionState) { > throw new IllegalStateException( > {code} > But, the exceptionState is set, so we end up throwing an > IllegalStateException during cleanup. > Seems the code should agree: if {{next( )}} will be called during cleanup, > then {{next( )}} should gracefully handle that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )
[ https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-5083: --- Assignee: (was: Paul Rogers) > IteratorValidator does not handle RecordIterator cleanup call to next( ) > > > Key: DRILL-5083 > URL: https://issues.apache.org/jira/browse/DRILL-5083 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Priority: Minor > > This one is very confusing... > In a test with a MergeJoin and external sort, operators are stacked something > like this: > {code} > Screen > - MergeJoin > - - External Sort > ... > {code} > Using the injector to force a OOM in spill, the external sort threw a > UserException up the stack. This was handed by: > {code} > IteratorValidatorBatchIterator.next( ) > RecordIterator.clearInflightBatches( ) > RecordIterator.close( ) > MergeJoinBatch.close( ) > {code} > Which does the following: > {code} > // Check whether next() should even have been called in current state. > if (null != exceptionState) { > throw new IllegalStateException( > {code} > But, the exceptionState is set, so we end up throwing an > IllegalStateException during cleanup. > Seems the code should agree: if {{next( )}} will be called during cleanup, > then {{next( )}} should gracefully handle that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4984) Limit 0 raises NullPointerException on JDBC storage sources
[ https://issues.apache.org/jira/browse/DRILL-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holger Kiel updated DRILL-4984: --- Fix Version/s: (was: 1.9.0) > Limit 0 raises NullPointerException on JDBC storage sources > --- > > Key: DRILL-4984 > URL: https://issues.apache.org/jira/browse/DRILL-4984 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.8.0 > Environment: Latest 1.9 Snapshot, also 1.8 release version, > mysql-connector-java-5.1.30, mysql-connector-java-5.1.40 >Reporter: Holger Kiel > > NullPointerExceptions occur when a query with 'limit 0' is executed on a jdbc > storage source (e.g. Mysql): > {code} > 0: jdbc:drill:zk=local> select * from mysql.sugarcrm.sales_person limit 0; > Error: SYSTEM ERROR: NullPointerException > [Error Id: 6cd676fc-6db9-40b3-81d5-c2db044aeb77 on localhost:31010] > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: null > org.apache.drill.exec.work.foreman.Foreman.run():281 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > Caused By (java.lang.NullPointerException) null > > org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans.visit():55 > org.apache.calcite.rel.core.TableScan.accept():166 > org.apache.calcite.rel.RelShuttleImpl.visitChild():53 > org.apache.calcite.rel.RelShuttleImpl.visitChildren():68 > org.apache.calcite.rel.RelShuttleImpl.visit():126 > org.apache.calcite.rel.AbstractRelNode.accept():256 > org.apache.calcite.rel.RelShuttleImpl.visitChild():53 > org.apache.calcite.rel.RelShuttleImpl.visitChildren():68 > org.apache.calcite.rel.RelShuttleImpl.visit():126 > org.apache.calcite.rel.AbstractRelNode.accept():256 > > org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans.canForceSingleMode():45 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():262 > > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():290 > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():168 > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan():123 > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():97 > org.apache.drill.exec.work.foreman.Foreman.runSQL():1008 > org.apache.drill.exec.work.foreman.Foreman.run():264 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > 0: jdbc:drill:zk=local> select * from mysql.sugarcrm.sales_person limit 1; > +-+-+++-+ > | id | first_name | last_name| full_name | manager_id | > +-+-+++-+ > | 1 | null| Administrator | admin | 0 | > +-+-+++-+ > 1 row selected (0,235 seconds) > {code} > Other datasources are okay: > {code} > 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` LIMIT 0; > +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+ > | fqn | filename | filepath | suffix | employee_id | full_name | > first_name | last_name | position_id | position_title | store_id | > department_id | birth_date | hire_date | salary | supervisor_id | > education_level | marital_status | gender | management_role | > +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+ > +--+---+---+-+--++-++--+-+---++-++-++--+-+-+--+ > No rows selected (0,309 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong reassigned DRILL-4347: --- Assignee: Gautam Kumar Parai (was: Aman Sinha) Assigning to [~gparai] for review. > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_returns, > . . . . . . . . . . . . > cs_ui, > . . . . . . . . . . . . > date_dim d1, > . . . . . . . . . . . . > date_dim d2, > . . . . . . . . . . . . > date_dim d3, > . . . . . . . . . . . . > store, > . . . . . . . . . . . . > customer, > . . . . . . . . . . . . > customer_demographics cd1, > . . . . . . . . . . . . > customer_demographics cd2, > . . . . . . . . . . . . > promotion, > . . . . . . . . . . . . > household_demographics hd1, > . . . . . . . . . . . . > household_demographics hd2, > . . . . . . . . . . . . > customer_address ad1, > . . . . . . . . . . . . > customer_address ad2, > . . . . . . . . . . . . > income_band ib1, > . . . . . . . . . . . . > income_band ib2, > . . . . . . . . . . . . > item > . . . . . . . . . . . . > WHERE ss_store_sk = s_store_sk > . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk > . . . . . . . . . . . . > AND ss_customer_sk = c_customer_sk > . . . . . . . . . . . . > AND ss_cdemo_sk = cd1.cd_demo_sk > . . . . . . . . . . . . > AND ss_hdemo_sk = hd1.hd_demo_sk > . . . . . . . . . . . . > AND ss_add
[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706525#comment-15706525 ] ASF GitHub Bot commented on DRILL-4347: --- Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/671 It is quite likely that the CachingRelMetadataProvider is meant for this. Based on the stack trace, there are multiple instances of "at org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:132)" and that line # indicates that there was either a cache miss or the entry was stale. So, the caching provider does in fact get used but then subsequently gets stuck in the apply() method of the ReflectiveRelMetadataProvider. I did not attempt to debug why it got stuck there...partly because I am not very familiar with the way reflection is used in this provider. Hence, my fix is an attempt to circumvent the issue. > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_returns, > . . . . . . . . . . . . > cs_ui, > . . . . . . . . . . . . > date_dim d1, > . . . . . . . . . . . . > date_dim d2, > . . . . . . . . . . . . > date_dim d3, > . . . . . . . . . . . . > store, > . . . . . . . . . . . . > customer, > . . . . . . . . . . . . > customer_demographics cd1, > . . . . . . . . . . . . > customer_demographics cd2, > . . . . . . . . . . . . > promotion, > . . . . . . . . . . . . > household_demographics hd1, > . . . . . . . . . . . . >
[jira] [Commented] (DRILL-5065) Optimize count( *) queries on MapR-DB JSON Tables
[ https://issues.apache.org/jira/browse/DRILL-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706543#comment-15706543 ] ASF GitHub Bot commented on DRILL-5065: --- Github user adityakishore commented on the issue: https://github.com/apache/drill/pull/670 Could you please add couple of unit test to verify the plan? > Optimize count( *) queries on MapR-DB JSON Tables > - > > Key: DRILL-5065 > URL: https://issues.apache.org/jira/browse/DRILL-5065 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - MapRDB >Affects Versions: 1.9.0 > Environment: Clusters with MapR v5.2.0 and above >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > > The JSON FileReader optimizes count(* ) queries, by only counting the number > of records in the files and discarding the data. This makes the query > execution faster & efficient. > We need a similar feature in the MapR format plugin (maprdb) to optimize _id > only projection & count(* ) queries on MapR-DB JSON Tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706579#comment-15706579 ] ASF GitHub Bot commented on DRILL-4347: --- Github user julianhyde commented on the issue: https://github.com/apache/drill/pull/671 Yes, CachingRelMetadataProvider is meant for this. After https://issues.apache.org/jira/browse/CALCITE-604, providers are considerably more efficient (not called via reflection), and RelMetadataQuery contains a per-request cache (because some metadata does change, but slowly). > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_returns, > . . . . . . . . . . . . > cs_ui, > . . . . . . . . . . . . > date_dim d1, > . . . . . . . . . . . . > date_dim d2, > . . . . . . . . . . . . > date_dim d3, > . . . . . . . . . . . . > store, > . . . . . . . . . . . . > customer, > . . . . . . . . . . . . > customer_demographics cd1, > . . . . . . . . . . . . > customer_demographics cd2, > . . . . . . . . . . . . > promotion, > . . . . . . . . . . . . > household_demographics hd1, > . . . . . . . . . . . . > household_demographics hd2, > . . . . . . . . . . . . > customer_address ad1, > . . . . . . . . . . . . > customer_address ad2, > . . . . . . . . . . . . > income_band ib1, > . . . . . . . . . . . . > income_band ib2, > . . . . . . . . . . . . > item > . . . . . . . . . . . . > WHERE ss_store_sk = s_store_sk > . . . .
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706599#comment-15706599 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r90091315 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/InvalidConnectionInfoException.java --- @@ -0,0 +1,29 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc; --- End diff -- discussed offline to keep it as is. > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > Labels: ready-to-commit > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706597#comment-15706597 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r89928539 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -245,14 +327,15 @@ public synchronized void connect(String connect, Properties props) throws RpcExc throw new RpcException("Failure setting up ZK for client.", e); } } - - final ArrayList endpoints = new ArrayList<>(clusterCoordinator.getAvailableEndpoints()); - checkState(!endpoints.isEmpty(), "No DrillbitEndpoint can be found"); - // shuffle the collection then get the first endpoint - Collections.shuffle(endpoints); - endpoint = endpoints.iterator().next(); + endpoints.addAll(clusterCoordinator.getAvailableEndpoints()); + // Make sure we have at least one endpoint in the list + checkState(!endpoints.isEmpty(), "No active Drillbit endpoint found from zookeeper"); --- End diff -- "checkState" is only called when drillbit information is populated using ZooKeeper. For connection string case we are throwing InvalidConnectionInfoException. Changed the string to "ZooKeeper" > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > Labels: ready-to-commit > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706600#comment-15706600 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r89929966 --- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/InvalidConnectionInfoException.java --- @@ -0,0 +1,29 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.rpc; + +/** + * Exception for malformed connection string from client + */ +public class InvalidConnectionInfoException extends RpcException { --- End diff -- The connect method only throws RPCException. Also didn't want to extend the method signature to throw one more checked exception since overtime that can grow. > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > Labels: ready-to-commit > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706601#comment-15706601 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r90090906 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientTest.java --- @@ -0,0 +1,258 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.client; + +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.exec.DrillSystemTestBase; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.proto.CoordinationProtos; +import org.apache.drill.exec.rpc.InvalidConnectionInfoException; +import org.junit.Test; +import java.util.List; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.fail; + + +/** + * The unit test case will read a physical plan in json format. The physical plan contains a "trace" operator, + * which will produce a dump file. The dump file will be input into DumpCat to test query mode and batch mode. + */ + +public class DrillClientTest extends DrillSystemTestBase { + + private final DrillConfig config = DrillConfig.create(); + + @Test + public void testParseAndVerifyEndpointsSingleDrillbitIp() throws Exception { + +// Test with single drillbit ip +final String drillBitConnection = "10.10.100.161"; +final List endpointsList = DrillClient.parseAndVerifyEndpoints + (drillBitConnection, config.getString(ExecConstants.INITIAL_USER_PORT)); +final CoordinationProtos.DrillbitEndpoint endpoint = endpointsList.get(0); +assertEquals(endpointsList.size(), 1); +assertEquals(endpoint.getAddress(), drillBitConnection); +assertEquals(endpoint.getUserPort(), config.getInt(ExecConstants.INITIAL_USER_PORT)); + } + + @Test + public void testParseAndVerifyEndpointsSingleDrillbitIpPort() throws Exception { + +// Test with single drillbit ip:port +final String drillBitConnection = "10.10.100.161:5000"; +final String[] ipAndPort = drillBitConnection.split(":"); +final List endpointsList = DrillClient.parseAndVerifyEndpoints + (drillBitConnection, config.getString(ExecConstants.INITIAL_USER_PORT)); +assertEquals(endpointsList.size(), 1); + +final CoordinationProtos.DrillbitEndpoint endpoint = endpointsList.get(0); +assertEquals(endpoint.getAddress(), ipAndPort[0]); +assertEquals(endpoint.getUserPort(), Integer.parseInt(ipAndPort[1])); + } + + @Test + public void testParseAndVerifyEndpointsMultipleDrillbitIp() throws Exception { + +// Test with multiple drillbit ip +final String drillBitConnection = "10.10.100.161,10.10.100.162"; +final List endpointsList = DrillClient.parseAndVerifyEndpoints + (drillBitConnection, config.getString(ExecConstants.INITIAL_USER_PORT)); +assertEquals(endpointsList.size(), 2); + +CoordinationProtos.DrillbitEndpoint endpoint = endpointsList.get(0); +assertEquals(endpoint.getAddress(), "10.10.100.161"); +assertEquals(endpoint.getUserPort(), config.getInt(ExecConstants.INITIAL_USER_PORT)); + +endpoint = endpointsList.get(1); +assertEquals(endpoint.getAddress(), "10.10.100.162"); +assertEquals(endpoint.getUserPort(), config.getInt(ExecConstants.INITIAL_USER_PORT)); + } + + @Test + public void testParseAndVerifyEndpointsMultipleDrillbitIpPort() throws Exception { + +// Test with multiple drillbit ip:port +final String drillBitConnection = "10.10.100.161:5000,10.10.100.162:5000"; +final List endpointsList = DrillClient.parseAndVerifyEndpoints + (drillBitConnection, config.getString(ExecConstants.INITIAL_USER_PORT));
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706602#comment-15706602 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r89928643 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/client/DrillClientSystemTest.java --- @@ -17,11 +17,15 @@ */ package org.apache.drill.exec.client; +import java.util.ArrayList; --- End diff -- I will remove this file from the list since have moved all the changes to different file "DrillClientTest.java" > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > Labels: ready-to-commit > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706598#comment-15706598 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r90089781 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +224,100 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Populates the endpointlist with drillbits information provided in the connection string by client. + * For direct connection we can have connection string with drillbit property as below: + * + * drillbit=ip + * use the ip specified as the Foreman ip with default port in config file + * drillbit=ip:port + * use the ip and port specified as the Foreman ip and port + * drillbit=ip1:port1,ip2:port2,... + * randomly select the ip and port pair from the specified list as the Foreman ip and port. + * + * + * @param drillbits string with drillbit value provided in connection string + * @param defaultUserPort string with default userport of drillbit specified in config file + * @return list of drillbit endpoints parsed from connection string + * @throws InvalidConnectionInfoException if the connection string has invalid or no drillbit information + */ + static List parseAndVerifyEndpoints(String drillbits, String defaultUserPort) +throws InvalidConnectionInfoException { +// If no drillbits is provided then throw exception +drillbits = drillbits.trim(); +if (drillbits.isEmpty()) { + throw new InvalidConnectionInfoException("No drillbit information specified in the connection string"); +} + +ArrayList endpointList = new ArrayList<>(); +final String[] connectInfo = drillbits.split(","); + +// Fetch ip address and port information for each drillbit and populate the list +for (String drillbit : connectInfo) { + + // Trim all the empty spaces and check if the entry is empty string. + // Ignore the empty ones. + drillbit = drillbit.trim(); + + if (!drillbit.isEmpty()) { +// Verify if we have only ":" or only ":port" pattern +if (drillbit.charAt(0) == ':') { + // Invalid drillbit information + throw new InvalidConnectionInfoException("Malformed connection string with drillbit hostname or " + + "hostaddress missing for an entry: " + drillbit); +} + +// We are now sure that each ip:port entry will have both the values atleast once. +// Split each drillbit connection string to get ip address and port value +final String[] drillbitInfo = drillbit.split(":"); + +// Check if we have more than one port +if (drillbitInfo.length > 2) { + throw new InvalidConnectionInfoException("Malformed connection string with more than one port in a " + + "drillbit entry: " + drillbit); +} + +// At this point we are sure that drillbitInfo has atleast hostname or host address +// trim all the empty spaces which might be present in front of hostname or +// host address information +final String ipAddress = drillbitInfo[0].trim(); +String port = defaultUserPort; + +if (drillbitInfo.length == 2) { + // We have a port value also given by user. trim all the empty spaces between : and port value before + // validating the correctness of value. + port = drillbitInfo[1].trim(); +} + +try { + final DrillbitEndpoint endpoint = DrillbitEndpoint.newBuilder() +.setAddress(ipAddress) + .setUserPort(Integer.parseInt(port)) +.build(); + + endpointList.add(endpoint); +} catch (NumberFormatException e) { + throw new InvalidConnectionInfoException("Malformed port value in entry: " + ipAddress + ":" + port + " " + + "passed in connection string"); +} + } +} +if(endpointList.size() == 0){ --- End diff -- fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > -
[jira] [Commented] (DRILL-5015) As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
[ https://issues.apache.org/jira/browse/DRILL-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706603#comment-15706603 ] ASF GitHub Bot commented on DRILL-5015: --- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/648#discussion_r90089750 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java --- @@ -223,19 +224,100 @@ public void connect(Properties props) throws RpcException { connect(null, props); } + /** + * Populates the endpointlist with drillbits information provided in the connection string by client. + * For direct connection we can have connection string with drillbit property as below: + * + * drillbit=ip + * use the ip specified as the Foreman ip with default port in config file + * drillbit=ip:port + * use the ip and port specified as the Foreman ip and port + * drillbit=ip1:port1,ip2:port2,... + * randomly select the ip and port pair from the specified list as the Foreman ip and port. + * + * + * @param drillbits string with drillbit value provided in connection string + * @param defaultUserPort string with default userport of drillbit specified in config file + * @return list of drillbit endpoints parsed from connection string + * @throws InvalidConnectionInfoException if the connection string has invalid or no drillbit information + */ + static List parseAndVerifyEndpoints(String drillbits, String defaultUserPort) +throws InvalidConnectionInfoException { +// If no drillbits is provided then throw exception +drillbits = drillbits.trim(); +if (drillbits.isEmpty()) { + throw new InvalidConnectionInfoException("No drillbit information specified in the connection string"); +} + +ArrayList endpointList = new ArrayList<>(); --- End diff -- fixed > As per documentation, when issuing a list of drillbits in the connection > string, we always attempt to connect only to the first one > --- > > Key: DRILL-5015 > URL: https://issues.apache.org/jira/browse/DRILL-5015 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.8.0, 1.9.0 >Reporter: Sorabh Hamirwasia >Assignee: Sudheesh Katkam > Labels: ready-to-commit > > When trying to connect to a Drill cluster by specifying more than 1 drillbits > to connect to, we always attempt to connect to only the first drillbit. > As an example, we tested against a pair of drillbits, but we always connect > to the first entry in the CSV list by querying for the 'current' drillbit. > The remaining entries are never attempted. > [root@pssc-60 agileSqlPerfTests]# /opt/mapr/drill/drill-1.8.0/bin/sqlline -u > "jdbc:drill:schema=dfs.tmp;drillbit=pssc-61:31010,pssc-62:31010" -f > whereAmI.q | grep -v logback > 1/1 select * from sys.drillbits where `current`; > +-++---++--+ > |hostname | user_port | control_port | data_port | current | > +-++---++--+ > | pssc-61.qa.lab | 31010 | 31011 | 31012 | true | > +-++---++--+ > 1 row selected (0.265 seconds) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > apache drill 1.8.0 > "a little sql for your nosql" > This property is meant for use by clients when not wanting to overload the ZK > for fetching a list of existing Drillbits, but the behaviour doesn't match > the documentation. > [Making a Direct Drillbit Connection | > https://drill.apache.org/docs/using-the-jdbc-driver/#using-the-jdbc-url-format-for-a-direct-drillbit-connection > ] > We need to randomly shuffle between this list and If an entry in the shuffled > list is unreachable, we need to try for the next entry in the list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4455) Depend on Apache Arrow for Vector and Memory
[ https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706607#comment-15706607 ] Julian Hyde commented on DRILL-4455: [~jnadeau], [~sphillips], [~amansinha100] and [~parthc], Can all parties please agree (and state publicly for the record) that moving value vector code out of Drill and into Arrow is in the best interests of the Drill project? Most contributions can be managed by a process of submitting a patch, review, reject, revise, and repeat. But this is not one of those patches that can be casually kicked back to the contributor. It is a huge, because it is an architectural change. I would like to see a commitment from both sides (contributor and reviewer) that we will find consensus and accept the patch. > Depend on Apache Arrow for Vector and Memory > > > Key: DRILL-4455 > URL: https://issues.apache.org/jira/browse/DRILL-4455 > Project: Apache Drill > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 2.0.0 > > > The code for value vectors and memory has been split and contributed to the > apache arrow repository. In order to help this project advance, Drill should > depend on the arrow project instead of internal value vector code. > This change will require recompiling any external code, such as UDFs and > StoragePlugins. The changes will mainly just involve renaming the classes to > the org.apache.arrow namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
[ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706719#comment-15706719 ] ASF GitHub Bot commented on DRILL-4347: --- Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/671 Thanks @julianhyde... CALCITE-604 should potentially help with this. Drill's calcite version has not caught up to this yet. Let me confer with @jinfengni sometime next week (he is on vacation until then) and get back on what can be done to get this into Drill. In the meantime, even though my patch addresses the hang issue, I will hold it for now. > Planning time for query64 from TPCDS test suite has increased 10 times > compared to 1.4 release > -- > > Key: DRILL-4347 > URL: https://issues.apache.org/jira/browse/DRILL-4347 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Gautam Kumar Parai > Fix For: Future > > Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, > 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt > > > mapr-drill-1.5.0.201602012001-1.noarch.rpm > {code} > 0: jdbc:drill:schema=dfs> WITH cs_ui > . . . . . . . . . . . . > AS (SELECT cs_item_sk, > . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale, > . . . . . . . . . . . . > Sum(cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit) AS refund > . . . . . . . . . . . . > FROM catalog_sales, > . . . . . . . . . . . . > catalog_returns > . . . . . . . . . . . . > WHERE cs_item_sk = cr_item_sk > . . . . . . . . . . . . > AND cs_order_number = > cr_order_number > . . . . . . . . . . . . > GROUP BY cs_item_sk > . . . . . . . . . . . . > HAVING Sum(cs_ext_list_price) > 2 * Sum( > . . . . . . . . . . . . > cr_refunded_cash + > cr_reversed_charge > . . . . . . . . . . . . > + cr_store_credit)), > . . . . . . . . . . . . > cross_sales > . . . . . . . . . . . . > AS (SELECT i_product_name product_name, > . . . . . . . . . . . . > i_item_sk item_sk, > . . . . . . . . . . . . > s_store_name store_name, > . . . . . . . . . . . . > s_zip store_zip, > . . . . . . . . . . . . > ad1.ca_street_number > b_street_number, > . . . . . . . . . . . . > ad1.ca_street_name > b_streen_name, > . . . . . . . . . . . . > ad1.ca_cityb_city, > . . . . . . . . . . . . > ad1.ca_zip b_zip, > . . . . . . . . . . . . > ad2.ca_street_number > c_street_number, > . . . . . . . . . . . . > ad2.ca_street_name > c_street_name, > . . . . . . . . . . . . > ad2.ca_cityc_city, > . . . . . . . . . . . . > ad2.ca_zip c_zip, > . . . . . . . . . . . . > d1.d_year AS syear, > . . . . . . . . . . . . > d2.d_year AS fsyear, > . . . . . . . . . . . . > d3.d_year s2year, > . . . . . . . . . . . . > Count(*) cnt, > . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1, > . . . . . . . . . . . . > Sum(ss_list_price) s2, > . . . . . . . . . . . . > Sum(ss_coupon_amt) s3 > . . . . . . . . . . . . > FROM store_sales, > . . . . . . . . . . . . > store_returns, > . . . . . . . . . . . . > cs_ui, > . . . . . . . . . . . . > date_dim d1, > . . . . . . . . . . . . > date_dim d2, > . . . . . . . . . . . . > date_dim d3, > . . . . . . . . . . . . > store, > . . . . . . . . . . . . > customer, > . . . . . . . . . . . . > customer_demographics cd1, > . . . . . . . . . . . . > customer_demographics cd2, > . . . . . . . . . . . . > promotion, > . . . . . . . . . . . . > household_demographics hd1, > . . . . . . . . . . . . > household_demographics hd2, > . . . . . . . . . . . . > customer_address ad1, > . . . . . . . . . . . . > customer_address ad2, > . . . . . . . . . . . . > income_band ib1, > . . . . . . . . . . . . > income_band ib2, > . . . . . . . . . . . . > item > . . .
[jira] [Commented] (DRILL-5082) Metadata Cache is being refreshed every single time
[ https://issues.apache.org/jira/browse/DRILL-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707052#comment-15707052 ] Padma Penumarthy commented on DRILL-5082: - Tried with different file systems to see what the behavior is. HDFS, MapR FS, Linux and Mac OS X, all behave the same way. rename does not update the modification time of the file and since parent directory is changed, its modification time is updated. One possible solution is after rename, update the modification time of the file to be same as parent directory in the code using FileSystem setTimes API. I made the change and tested it to be solving this problem. However, with concurrent access, there could be issues which can only be solved with some kind of distributed locking mechanism in place. There will be a window between rename and setTimes, during which if a read comes from another connection, we may end up rebuilding the metadata cache. setTimes after rename with no synchronization across multiple writers may create problems with timestamps when there are concurrent writes. > Metadata Cache is being refreshed every single time > --- > > Key: DRILL-5082 > URL: https://issues.apache.org/jira/browse/DRILL-5082 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Reporter: Rahul Challapalli >Assignee: Padma Penumarthy >Priority: Critical > > Git Commit : 04fb0be191ef09409c00ca7173cb903dfbe2abb0 > After the DRILL-4381 fix we are refreshing the metadata cache for every > single query. This could be because renaming a file is updating the > directory's timestamp but not the renamed file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3637) Elasticsearch storage plugin
[ https://issues.apache.org/jira/browse/DRILL-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707488#comment-15707488 ] Charles Givre commented on DRILL-3637: -- Hello all, I stumbled on this https://github.com/Anchormen/sql4es this evening and was wondering if this might be of use in integrating Drill with Elasticsearch. It is a mostly fully functional JDBC driver for Elasticsearch. Here's the description: Sql-for-Elasticsearch (sql4es) is a jdbc 4.1 driver for Elasticsearch 2.0 - 2.4 implementing the majority of the JDBC interfaces: Connection, Statement, PreparedStatment, ResultSet, Batch and DataBase- / ResultSetMetadata. The screenshot below shows SQLWorkbenchJ with a selection of SQL statements that can be executed using the driver. As of version 0.8.2.3 the driver supports Shield allowing the use of credentials and SSL. > Elasticsearch storage plugin > > > Key: DRILL-3637 > URL: https://issues.apache.org/jira/browse/DRILL-3637 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - ElasticSearch >Reporter: Andrew > Fix For: Future > > > Create a storage plugin for elasticsearch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5031) Documentation for HTTPD Parser
[ https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-5031: - Fix Version/s: (was: Future) 1.9.0 > Documentation for HTTPD Parser > -- > > Key: DRILL-5031 > URL: https://issues.apache.org/jira/browse/DRILL-5031 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Charles Givre >Priority: Minor > Labels: doc-impacting > Fix For: 1.9.0 > > > https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-5031) Documentation for HTTPD Parser
[ https://issues.apache.org/jira/browse/DRILL-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-5031: - Flags: Patch,Important (was: Patch) > Documentation for HTTPD Parser > -- > > Key: DRILL-5031 > URL: https://issues.apache.org/jira/browse/DRILL-5031 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Charles Givre >Priority: Minor > Labels: doc-impacting > Fix For: 1.9.0 > > > https://gist.github.com/cgivre/47f07a06d44df2af625fc6848407ae7c -- This message was sent by Atlassian JIRA (v6.3.4#6332)