date:20190223

[jira] [Updated] (IMPALA-7207) Make Coordinator ExecState an atomic enum

2019-02-23 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-7207:
---
Fix Version/s: Impala 2.13.0

> Make Coordinator ExecState an atomic enum
> -
>
> Key: IMPALA-7207
> URL: https://issues.apache.org/jira/browse/IMPALA-7207
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Distributed Exec
>Affects Versions: Impala 3.1.0
>Reporter: Dan Hecht
>Assignee: Dan Hecht
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Let's make exec_state_ an atomic so that we can read that field alone without 
> holding the exec_state_lock_. That will be a precursor to both IMPALA-6788 
> and IMPALA-7205.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7191) Daemons should call srand() early in main rather than at random locations

2019-02-23 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-7191:
---
Fix Version/s: Impala 2.13.0

> Daemons should call srand() early in main rather than at random locations
> -
>
> Key: IMPALA-7191
> URL: https://issues.apache.org/jira/browse/IMPALA-7191
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Dan Hecht
>Assignee: Dan Hecht
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Currently we call srand() at "random" places. Let's call it when the daemons 
> startup and remove the other places that happen at arbitrary places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7121) Clean up partitionIds_ member from HdfsTable

2019-02-23 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-7121:
---
Fix Version/s: Impala 2.13.0

> Clean up partitionIds_ member from HdfsTable
> 
>
> Key: IMPALA-7121
> URL: https://issues.apache.org/jira/browse/IMPALA-7121
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> HdfsTable already has a number of internal structures that meant to speed-up 
> processes like partition pruning. partitionIds_ is a HashSet of partition IDs 
> but apparently we already have this information in partitionMap_ that is a 
> mapping between partition IDs and HdfsPartitions. As a result we can simply 
> drop partitionsIds_ and modify getPartitionIds() to return 
> partitionMap_.keySet().
> This is not expected to introduce regression for the following reasons:
>  * HashMap.keySet() is O(1) complex as it returns a wrapper around an 
> internal set of keys from the HashMap.
>  * We have to be careful not to modify this keySet() returned from 
> getPartitionIds() because that would also alter the partitionMap_ member. 
> This is safe as all callsites of getPartitionIds() immediately copies the 
> items of the set to a separate set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7175) In a local FS build, test_native_functions_race thinks there are 2 impalads where there should be 1

2019-02-23 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-7175:
---
Fix Version/s: Impala 2.13.0

> In a local FS build, test_native_functions_race thinks there are 2 impalads 
> where there should be 1
> ---
>
> Key: IMPALA-7175
> URL: https://issues.apache.org/jira/browse/IMPALA-7175
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Tianyi Wang
>Assignee: Vuk Ercegovac
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> In TestUdfExecution.test_native_functions_race, the test checks the number of 
> impalads at the beginning and end of the test. In a local build there should 
> be only 1 impalad but somehow the test found 2 at the beginning of the test 
> and failed. 
> {noformat}
> Stacktrace
> query_test/test_udfs.py:379: in test_native_functions_race
> assert len(cluster.impalads) == exp_num_impalads
> E   assert 1 == 2
> E+  where 1 = len([ 0xc9ffa90>])
> E+where [ 0xc9ffa90>] =  0x6a5d510>.impalads
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7193) Local filesystem failes with fs.defaultFS (file:/tmp) is not supported

2019-02-23 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-7193:
---
Fix Version/s: Impala 2.13.0

> Local filesystem failes with fs.defaultFS (file:/tmp) is not supported
> --
>
> Key: IMPALA-7193
> URL: https://issues.apache.org/jira/browse/IMPALA-7193
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Tianyi Wang
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Impala failed to start on a local file system
> {noformat}
> I0619 07:25:34.034348 13157 status.cc:125] Currently configured default 
> filesystem: ProxyLocalFileSystem. fs.defaultFS (file:/tmp) is not supported.
> @  0x18a01f9  impala::Status::Status()
> @  0x1dd30b6  impala::Frontend::ValidateSettings()
> @  0x1dee2aa  impala::ImpalaServer::ImpalaServer()
> @  0x1dea8db  ImpaladMain()
> @  0x185c380  main
> @ 0x7f373f829c05  __libc_start_main
> @  0x185c1f1  (unknown)
> E0619 07:25:34.034366 13157 impala-server.cc:286] Currently configured 
> default filesystem: ProxyLocalFileSystem. fs.defaultFS (file:/tmp) is not 
> supported.
> E0619 07:25:34.034384 13157 impala-server.cc:289] Aborting Impala Server 
> startup due to improper configuration. Impalad exiting.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8241) from_utc_timestamp returns inconsistent results with Hive

2019-02-23 Thread Quanlong Huang (JIRA)

Quanlong Huang created IMPALA-8241:
--

 Summary: from_utc_timestamp returns inconsistent results with Hive
 Key: IMPALA-8241
 URL: https://issues.apache.org/jira/browse/IMPALA-8241
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang


This can be reproduced in both master and 2.x branches.

{code}
[localhost:21000] default> select from_utc_timestamp(cast(40 * 3600.0 as 
timestamp), 'EST');
Query: select from_utc_timestamp(cast(40 * 3600.0 as timestamp), 'EST')
Query submitted at: 2019-02-23 17:27:02 (Coordinator: 
http://impala-jenkins-slave-02:25000)
Query progress can be monitored at: 
http://impala-jenkins-slave-02:25000/query_plan?query_id=f476c87a904f281:71588a24
+---+
| from_utc_timestamp(cast(40 * 3600.0 as timestamp), 'est') |
+---+
| 2015-08-19 11:00:00   |
+---+
Fetched 1 row(s) in 0.64s
{code}

{code}
hive> select from_utc_timestamp(cast(40 * 3600.0 as timestamp), 'EST');
OK
2015-08-19 04:00:00
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-7645) Allow configuring default file format via query option

2019-02-23 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7645 started by Fredy Wijaya.

> Allow configuring default file format via query option
> --
>
> Key: IMPALA-7645
> URL: https://issues.apache.org/jira/browse/IMPALA-7645
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Fredy Wijaya
>Priority: Major
>
> It would be useful to have a query option to allow setting the default file 
> format. This would allow the file format to be overridden globally or 
> per-session. We already have a COMPRESSION_CODEC option. 
> We had some discussion on IMPALA-2210 related to this.
> The current default is hardcoded in the code: 
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L136
>  
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L145



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7645) Allow configuring default file format via query option

2019-02-23 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya reassigned IMPALA-7645:


Assignee: Fredy Wijaya

> Allow configuring default file format via query option
> --
>
> Key: IMPALA-7645
> URL: https://issues.apache.org/jira/browse/IMPALA-7645
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Fredy Wijaya
>Priority: Major
>
> It would be useful to have a query option to allow setting the default file 
> format. This would allow the file format to be overridden globally or 
> per-session. We already have a COMPRESSION_CODEC option. 
> We had some discussion on IMPALA-2210 related to this.
> The current default is hardcoded in the code: 
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L136
>  
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L145



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7915) Wrap SQL parser to avoid redundant code

2019-02-23 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-7915.
-
Resolution: Fixed

> Wrap SQL parser to avoid redundant code
> ---
>
> Key: IMPALA-7915
> URL: https://issues.apache.org/jira/browse/IMPALA-7915
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The FE has several repeated blocks of code to set up the lexer and parser, to 
> parse, and to handle errors.
> Move this code into a static function that can be used in place of the copies.
> At the same time, provide a specific {{ParseException}} to replace the 
> generic {{Exception}} thrown by the parser to allow easier error handling.
> Some of the uses of the parser assume the return value is {{Object}}, others 
> that the value is {{ParseNode}} and still others that it is 
> {{StatementBase}}. Since the actual return is {{StatementBase}}, declare that 
> as the return value of the new static method to clearly state the actual 
> output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2019-02-23 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7310:
---

Assignee: (was: Paul Rogers)

> Compute Stats not computing NULLs as a distinct value causing wrong estimates
> -
>
> Key: IMPALA-7310
> URL: https://issues.apache.org/jira/browse/IMPALA-7310
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, 
> Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Zsombor Fedor
>Priority: Major
>
> As seen in other DBMSs
> {code:java}
> NDV(col){code}
> not counting NULL as a distinct value. The same also applies to
> {code:java}
> COUNT(DISTINCT col){code}
> This is working as intended, but when computing column statistics it can 
> cause some anomalies (i.g. bad join order) as compute stats uses NDV() to 
> determine columns NDVs.
>  
> For example when aggregating more columns, the estimated cardinality is 
> [counted as the product of the columns' number of distinct 
> values.|https://github.com/cloudera/Impala/blob/64cd0bb0c3529efa0ab5452c4e9e2a04fd815b4f/fe/src/main/java/org/apache/impala/analysis/Expr.java#L669]
>  If there is a column full of NULLs the whole product will be 0.
>  
> There are two possible fix for this.
> Either we should count NULLs as a distinct value when Computing Stats in the 
> query:
> {code:java}
> SELECT NDV(a) + COUNT(DISTINCT CASE WHEN a IS NULL THEN 1 END) AS a, CAST(-1 
> as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
> instead of
> {code:java}
> SELECT NDV(a) AS a, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
>  
>  
> Or we should change the planner 
> [function|https://github.com/cloudera/Impala/blob/2d2579cb31edda24457d33ff5176d79b7c0432c5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L169]
>  to take care of this bug.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7847) Standardize expression error message in analyzer to ease testing

2019-02-23 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-7847.
-
Resolution: Won't Fix

Upon reflection, while this seems like a good idea, the cost of making the 
change is not worth the benefit. Copy & paste of error messages appears to work 
well enough.

> Standardize expression error message in analyzer to ease testing
> 
>
> Key: IMPALA-7847
> URL: https://issues.apache.org/jira/browse/IMPALA-7847
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Trivial
>
> The analyzer checks expressions in all clauses to exclude unsupported 
> features such as subqueries, aggregates or analytic expressions. When found, 
> the [analyzer emits an error 
> message|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java].
>  At present, these message are wonderfully inconsistent, which makes testing 
> tedious:
> {quote}
> "aggregate function not allowed in WHERE clause"
> "WHERE clause must not contain analytic expressions: " + e.toSql()
> "HAVING clause must not contain analytic expressions: "
>  + analyticExpr.toSql()
> "Subqueries are not supported in the ORDER BY clause."
> {quote}
> The proposal is to standardize the messages as follows:
> {quote}
>  are not supported in : 
> {quote}
> Where  is "Subqueries", "Analytic expressions" and "Aggregate 
> functions",  is "SELECT list", "WHERE clause", "ORDER BY clause", 
> "HAVING clause" and "GROUP BY clause", and the expression is the 
> before-rewrite version of the expression in question.
> The result will be that tests are a bit easier to write since we need not 
> track down the specific odd wording for each error case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7805) NumericLiteral toSql() should render zero as 0, not 0-E38, 0.000, etc.

2019-02-23 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-7805.
-
Resolution: Fixed

> NumericLiteral toSql() should render zero as 0, not 0-E38, 0.000, etc.
> --
>
> Key: IMPALA-7805
> URL: https://issues.apache.org/jira/browse/IMPALA-7805
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Testing of other issues revealed a somewhat bizarre aspect of how the planner 
> expression nodes render 0. {{NumericLiteral.toSql()}} uses the Java 
> {{BigDecimal}} class to convert a numeric value to a string for use in 
> explained plans.
> The default Java behavior is to consider scale when rendering numbers, 
> including 0. Thus, depending on precision and scale, you may get:
> {noformat}
> 0
> 0.0
> 0.00
> 0.000
> ...
> 0E-38
> {noformat}
> Mathematically, zero is zero. Unlike Java, SQL attaches no significance to 
> the decimal point. (In Java, 0 is an integer, 0.0 is a float.) Nor does SQL 
> attach significance to the number of zeros past the decimal point. And, of 
> course, we're only talking about the output of {{EXPLAIN}}, which is never 
> parsed anyway (except in tests.)
> To make testing easier, change the behavior to always emit "0" when the value 
> is zero, regardless of precision or scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7608) Estimate row count from file size when no stats available

2019-02-23 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7608:
---

Assignee: (was: Paul Rogers)

> Estimate row count from file size when no stats available
> -
>
> Key: IMPALA-7608
> URL: https://issues.apache.org/jira/browse/IMPALA-7608
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Major
>
> Impala makes heavy use of stats, which is a good thing. Stats feed into query 
> planning where they allow the planner to choose among a fixed set of 
> alternatives such as: do I put t1 on the build or probe side of a join?
> Because the planner decisions tend to be discrete, we only need enough 
> information to decide whether to do A or B (or, more generally, to choose 
> among a set of choices A, B, C, ... N).
> Often data sizes are vastly different on different paths. Stats help refine 
> these numbers, but much of the information just needs to be in the ball park: 
> is table t1 larger or smaller than t2? Often, one table is much larger than 
> the other, so even a rough size estimate will force the right decision (put 
> the smaller table on the build side of a join.)
> Today, if Impala has no stats, it refuses to even consider table size. 
> Consider the following unit test:
> {noformat}
> runTest("SELECT a FROM functional.tinytable;", -1);
> {noformat}
> This plans the given query, then verifies that the expected result 
> cardinality is the number given. In this case, {{tinytable}} has no stats. 
> So, we don't know the cardinality. OK...
> The table turns out to be 3 rows. Perhaps I join this to a hypothetical 
> {{hugetable}} of 1 million rows. Without even a guess at cardinality, Impala 
> can't choose a good plan.
> The suggestion is to use table size to estimate row cardinality. Come up with 
> some assumed row width, say 100. Then, estimate row count as {{file size / 
> est. row width}}. This gives a ballpark number that would be plenty good for 
> the planner to choose the proper plan much of the time. 
> Since this is such an easy estimate to make, and will address the occasional 
> case in which stats are not available, it seems a shame to not take advantage 
> of this information.
> In terms of implementation, {{HdfsScanNode.computeCardinalities()}} already 
> uses some extrapolation, if enabled. It can be extended to do the last-ditch 
> extrapolation suggested above if, after the current techniques, the 
> cardinality is still undefined.
> If we apply this simple fix in a prototype build, the new test result is 
> closer to reality:
> {noformat}
> runTest("SELECT a FROM functional.tinytable;", 1);
> {noformat}
> Given that the fix is so simple, any reason not to use the file size, when 
> available? Is 100 a reasonable assumed row width? Should this functionality 
> always be on, not just when enabled using the back-end config?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7914) Introduce AST base class/interface for statement-like nodes

2019-02-23 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-7914.
-
Resolution: Fixed

> Introduce AST base class/interface for statement-like nodes
> ---
>
> Key: IMPALA-7914
> URL: https://issues.apache.org/jira/browse/IMPALA-7914
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The front-end is based on an abstract syntax tree (AST). The parser creates 
> the AST, the analyzer decorates it with semantic information, and the planner 
> creates a plan for it.
> At present, the class hierarchy looks like this:
> {noformat}
> ParseNode
> |-- Expr
> |   |-- 
> |-- FromClause
> |-- 
> |-- StatementBase
> |   |-- SelectStmt
> |   |-- 
> {noformat}
> This is a nuisance because the only common base class for all statement-like 
> nodes is the {{ParseNode}}, which is also the common base class or 
> expressions. However, expressions and statement-like nodes behave differently 
> during analysis, SQL generation, and so on.
> We propose to refactor the tree to introduce a new {{StmtNode}} interface or 
> class that defines the statement-like semantics, leaving {{Expr}} to define 
> the expression-like semantics. The methods then move out of {{ParseNode}}.
> This change all allow revising the analysis step as follows:
> * Analysis of statement-like nodes is done "in place"
> * Analysis of expression nodes may result in replacing one node with a 
> rewritten version
> Similarly, when generating SQL:
> * Statements provide the option of generating before- or after-rewrite SQL.
> * Expressions can only generate SQL for what they are; they have no concept 
> of before- or after- rewrites.
> Specifically:
> {noformat}
> ParseNode
> |-- Expr
> |   |-- 
> |-- StmtNode
> |   |-- FromClause
> |   |-- 
> |   |-- StatementBase
> |   |   |-- SelectStmt
> |   |   |-- 
> {noformat}
> It may be useful to introduce a {{ClauseNode}}, but we'll see if that is 
> actually helpful as we do the refactoring:
> {noformat}
> |-- StmtNode
> |   |-- ClauseNode
> |   |   |-- FromClause
> |   |   |-- 
> |   |-- StatementBase
> |   |   |-- SelectStmt
> |   |   |-- 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7807) Analysis test fixture to enable deeper testing

2019-02-23 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-7807.
-
Resolution: Fixed

> Analysis test fixture to enable deeper testing
> --
>
> Key: IMPALA-7807
> URL: https://issues.apache.org/jira/browse/IMPALA-7807
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> The Impala front-end provides a number of JUnit tests such as 
> {{ExprRewriteRulesTest}}. These tests verify rewrites by providing layers of 
> functions that build up a query, analyze the query, run rewrite rules, and 
> test one part of the result.
> The tests are fine as far as they go, but they do not cover all cases. For 
> example, they tests rewrites in the {{SELECT}} clause, but not {{ORDER BY}} 
> or {{GROUP BY}}. (Testing of those uncovered previously hidden bugs.) In some 
> cases, we want to test rewrite rules in detail, but the existing tests only 
> support a wholesale rewrite.
> Since the existing tests are function based, it is hard to inject new 
> behavior somewhere in the process, for example, to test the {{WHERE}} clause 
> rather than {{SELECT}} To do that, we need to copy the {{SELECT}} functions, 
> and make changes to test {{WHERE}}.
> Since copying of code is generally an undesirable approach, a better approach 
> is to use a "test fixture": a class that performs the required steps, 
> maintains intermediate state for inspection, and acts as the foundation for 
> various kinds of tests (such as the various clauses mentioned above.)
> In practice, all that is required is moving some code from functions on the 
> test class to be methods on a fixture class, which also holds onto state that 
> would otherwise be lost in function calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7207) Make Coordinator ExecState an atomic enum

[jira] [Updated] (IMPALA-7191) Daemons should call srand() early in main rather than at random locations

[jira] [Updated] (IMPALA-7121) Clean up partitionIds_ member from HdfsTable

[jira] [Updated] (IMPALA-7175) In a local FS build, test_native_functions_race thinks there are 2 impalads where there should be 1

[jira] [Updated] (IMPALA-7193) Local filesystem failes with fs.defaultFS (file:/tmp) is not supported

[jira] [Created] (IMPALA-8241) from_utc_timestamp returns inconsistent results with Hive

[jira] [Work started] (IMPALA-7645) Allow configuring default file format via query option

[jira] [Assigned] (IMPALA-7645) Allow configuring default file format via query option

[jira] [Resolved] (IMPALA-7915) Wrap SQL parser to avoid redundant code

[jira] [Assigned] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

[jira] [Resolved] (IMPALA-7847) Standardize expression error message in analyzer to ease testing

[jira] [Resolved] (IMPALA-7805) NumericLiteral toSql() should render zero as 0, not 0-E38, 0.000, etc.

[jira] [Assigned] (IMPALA-7608) Estimate row count from file size when no stats available

[jira] [Resolved] (IMPALA-7914) Introduce AST base class/interface for statement-like nodes

[jira] [Resolved] (IMPALA-7807) Analysis test fixture to enable deeper testing

15 matches

Site Navigation

Mail list logo

Footer information