[jira] [Updated] (IMPALA-7564) Conservative FK/PK join type detection with complex equi-join conjuncts

2018-09-12 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7564:
--
Description: 
With IMPALA-5547, we predict whether a join is an FK/PK join as follows.

{noformat}
 // Iterate over all groups of conjuncts that belong to the same joined tuple 
id pair.
// For each group, we compute the join NDV of the rhs slots and compare it 
to the
// number of rows in the rhs table.
for (List fkPkCandidate: 
scanSlotsByJoinedTids.values()) {
  double jointNdv = 1.0;
  for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
slots.rhsNdv();
  double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
  if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
FK_PK_MAX_STATS_DELTA_PERC))) {
// We cannot disprove that the RHS is a PK.
if (result == null) result = Lists.newArrayList();
result.addAll(fkPkCandidate);
  }
}
{noformat}

We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
their NDVs and check if it close to rhsNumRows. The issue here is that this can 
result in conservative FK/Pk detection if the equi-join conjuncts are not 
simple (of the form  = )

{noformat}
/**
 * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
or null if
 * the given conjunct is not of the form  =  or if the 
underlying
 * table/column of at least one side is missing stats.
 */
public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
  if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
  SlotDescriptor lhsScanSlot = eqJoinConjunct.getChild(0).findSrcScanSlot();
  if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
null;
  SlotDescriptor rhsScanSlot = eqJoinConjunct.getChild(1).findSrcScanSlot();
{noformat}

For example, the following query contains a complex equi-join conjunct 
{{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
outer join is an FK/PK, we just check if 
{{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. (This happens 
because EqJoinConjunctScanSlots.create() returns null for any non-simple 
predicates which are not considered later).

{noformat}
[localhost:21000]> explain select * from test_left l left outer join test_right 
r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6);
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
+-+
| Explain String
  |
+-+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5
  |
| Per-Host Resource Estimates: Memory=66MB  
  |
|   
  |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
  |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1  |
| PLAN-ROOT SINK
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| | 
  |
| 04:EXCHANGE [UNPARTITIONED]   
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT)  
  |
| | 
  |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
  |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2|
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] 
  |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
substr(r.c3, 1, 6)   |
| |  fk/pk conjuncts: none  
  |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT), 01(OPEN)
  |
| | 
  |
| |--03:EXCHANGE [BROADCAST] 

[jira] [Updated] (IMPALA-7564) Conservative FK/PK join type detection with complex equi-join conjuncts

2018-09-12 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7564:
--
Description: 
With IMPALA-5547, we predict whether a join is an FK/PK join as follows.

{noformat}
 // Iterate over all groups of conjuncts that belong to the same joined tuple 
id pair.
// For each group, we compute the join NDV of the rhs slots and compare it 
to the
// number of rows in the rhs table.
for (List fkPkCandidate: 
scanSlotsByJoinedTids.values()) {
  double jointNdv = 1.0;
  for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
slots.rhsNdv();
  double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
  if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
FK_PK_MAX_STATS_DELTA_PERC))) {
// We cannot disprove that the RHS is a PK.
if (result == null) result = Lists.newArrayList();
result.addAll(fkPkCandidate);
  }
}
{noformat}

We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
their NDVs and check if it close to rhsNumRows. The issue here is that this can 
result in conservative FK/Pk detection if the equi-join conjuncts are not 
simple (of the form  = )

{noformat}
/**
 * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
or null if
 * the given conjunct is not of the form  =  or if the 
underlying
 * table/column of at least one side is missing stats.
 */
public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
  if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
  SlotDescriptor lhsScanSlot = eqJoinConjunct.getChild(0).findSrcScanSlot();
  if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
null;
  SlotDescriptor rhsScanSlot = eqJoinConjunct.getChild(1).findSrcScanSlot();
{noformat}

For example, the following query contains a complex equi-join conjunct 
{{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
outer join is an FK/PK, we just check if 
{{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. (This happens 
because EqJoinConjunctScanSlots.create() returns null for any non-simple 
predicates which not considered later).

{noformat}
[localhost:21000]> explain select * from test_left l left outer join test_right 
r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6);
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
+-+
| Explain String
  |
+-+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5
  |
| Per-Host Resource Estimates: Memory=66MB  
  |
|   
  |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
  |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1  |
| PLAN-ROOT SINK
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| | 
  |
| 04:EXCHANGE [UNPARTITIONED]   
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT)  
  |
| | 
  |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
  |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2|
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] 
  |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
substr(r.c3, 1, 6)   |
| |  fk/pk conjuncts: none  
  |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT), 01(OPEN)
  |
| | 
  |
| |--03:EXCHANGE [BROADCAST] 

[jira] [Created] (IMPALA-7564) Conservative FK/PK join type estimation with complex equi-join conjuncts

2018-09-12 Thread bharath v (JIRA)
bharath v created IMPALA-7564:
-

 Summary: Conservative FK/PK join type estimation with complex 
equi-join conjuncts
 Key: IMPALA-7564
 URL: https://issues.apache.org/jira/browse/IMPALA-7564
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 2.12.0, Impala 2.13.0, Impala 3.1.0
Reporter: bharath v


With IMPALA-5547, we predict whether a join is an FK/PK join as follows.

{noformat}
 // Iterate over all groups of conjuncts that belong to the same joined tuple 
id pair.
// For each group, we compute the join NDV of the rhs slots and compare it 
to the
// number of rows in the rhs table.
for (List fkPkCandidate: 
scanSlotsByJoinedTids.values()) {
  double jointNdv = 1.0;
  for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
slots.rhsNdv();
  double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
  if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
FK_PK_MAX_STATS_DELTA_PERC))) {
// We cannot disprove that the RHS is a PK.
if (result == null) result = Lists.newArrayList();
result.addAll(fkPkCandidate);
  }
}
{noformat}

We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
their NDVs and check if it close to rhsNumRows. The issue here is that this can 
result in conservative FK/Pk detection if the equi-join conjuncts are not 
simple (of the form  = )

{noformat}
/**
 * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
or null if
 * the given conjunct is not of the form  =  or if the 
underlying
 * table/column of at least one side is missing stats.
 */
public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
  if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
  SlotDescriptor lhsScanSlot = eqJoinConjunct.getChild(0).findSrcScanSlot();
  if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
null;
  SlotDescriptor rhsScanSlot = eqJoinConjunct.getChild(1).findSrcScanSlot();
{noformat}

For example, the following query contains a complex equi-join conjunct 
{{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
outer join is an FK/PK, we just check if 
{{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. 

{noformat}
[localhost:21000]> explain select * from test_left l left outer join test_right 
r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6);
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
+-+
| Explain String
  |
+-+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5
  |
| Per-Host Resource Estimates: Memory=66MB  
  |
|   
  |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
  |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1  |
| PLAN-ROOT SINK
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| | 
  |
| 04:EXCHANGE [UNPARTITIONED]   
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT)  
  |
| | 
  |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
  |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2|
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] 
  |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
substr(r.c3, 1, 6)   |
| |  fk/pk conjuncts: none  
  |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT), 01(OPEN)
  |
| |

[jira] [Updated] (IMPALA-7564) Conservative FK/PK join type detection with complex equi-join conjuncts

2018-09-12 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7564:
--
Summary: Conservative FK/PK join type detection with complex equi-join 
conjuncts  (was: Conservative FK/PK join type estimation with complex equi-join 
conjuncts)

> Conservative FK/PK join type detection with complex equi-join conjuncts
> ---
>
> Key: IMPALA-7564
> URL: https://issues.apache.org/jira/browse/IMPALA-7564
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0, Impala 2.13.0, Impala 3.1.0
>Reporter: bharath v
>Priority: Major
>
> With IMPALA-5547, we predict whether a join is an FK/PK join as follows.
> {noformat}
>  // Iterate over all groups of conjuncts that belong to the same joined tuple 
> id pair.
> // For each group, we compute the join NDV of the rhs slots and compare 
> it to the
> // number of rows in the rhs table.
> for (List fkPkCandidate: 
> scanSlotsByJoinedTids.values()) {
>   double jointNdv = 1.0;
>   for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
> slots.rhsNdv();
>   double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
>   if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
> FK_PK_MAX_STATS_DELTA_PERC))) {
> // We cannot disprove that the RHS is a PK.
> if (result == null) result = Lists.newArrayList();
> result.addAll(fkPkCandidate);
>   }
> }
> {noformat}
> We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
> their NDVs and check if it close to rhsNumRows. The issue here is that this 
> can result in conservative FK/Pk detection if the equi-join conjuncts are not 
> simple (of the form  = )
> {noformat}
> /**
>  * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
> or null if
>  * the given conjunct is not of the form  =  or if the 
> underlying
>  * table/column of at least one side is missing stats.
>  */
> public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
>   if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
>   SlotDescriptor lhsScanSlot = 
> eqJoinConjunct.getChild(0).findSrcScanSlot();
>   if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
> null;
>   SlotDescriptor rhsScanSlot = 
> eqJoinConjunct.getChild(1).findSrcScanSlot();
> {noformat}
> For example, the following query contains a complex equi-join conjunct 
> {{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
> outer join is an FK/PK, we just check if 
> {{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. 
> {noformat}
> [localhost:21000]> explain select * from test_left l left outer join 
> test_right r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = 
> substr(r.c3, 1,6);
> Query: explain select * from test_left l left outer join test_right r on l.c1 
> = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
> +-+
> | Explain String  
> |
> +-+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=5  
> |
> | Per-Host Resource Estimates: Memory=66MB
> |
> | 
> |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
> |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1  |
> | PLAN-ROOT SINK  
> |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
> |
> | |   
> |
> | 04:EXCHANGE [UNPARTITIONED] 
> |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
> |
> | |  tuple-ids=0,1N row-size=94B cardinality=49334767023  
> |
> | |  in pipelines: 00(GETNEXT)
> |
> | |   
> |
> | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1  
> |
> | Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
> thread-reservation=2|
> | 02:HASH JOIN [LEFT OUTER JOIN, 

[jira] [Updated] (IMPALA-7564) Conservative FK/PK join type estimation with complex equi-join conjuncts

2018-09-12 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7564:
--
Description: 
With IMPALA-5547, we predict whether a join is an FK/PK join as follows.

{noformat}
 // Iterate over all groups of conjuncts that belong to the same joined tuple 
id pair.
// For each group, we compute the join NDV of the rhs slots and compare it 
to the
// number of rows in the rhs table.
for (List fkPkCandidate: 
scanSlotsByJoinedTids.values()) {
  double jointNdv = 1.0;
  for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
slots.rhsNdv();
  double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
  if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
FK_PK_MAX_STATS_DELTA_PERC))) {
// We cannot disprove that the RHS is a PK.
if (result == null) result = Lists.newArrayList();
result.addAll(fkPkCandidate);
  }
}
{noformat}

We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
their NDVs and check if it close to rhsNumRows. The issue here is that this can 
result in conservative FK/Pk detection if the equi-join conjuncts are not 
simple (of the form  = )

{noformat}
/**
 * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
or null if
 * the given conjunct is not of the form  =  or if the 
underlying
 * table/column of at least one side is missing stats.
 */
public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
  if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
  SlotDescriptor lhsScanSlot = eqJoinConjunct.getChild(0).findSrcScanSlot();
  if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
null;
  SlotDescriptor rhsScanSlot = eqJoinConjunct.getChild(1).findSrcScanSlot();
{noformat}

For example, the following query contains a complex equi-join conjunct 
{{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
outer join is an FK/PK, we just check if 
{{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. 

{noformat}
[localhost:21000]> explain select * from test_left l left outer join test_right 
r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6);
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
+-+
| Explain String
  |
+-+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5
  |
| Per-Host Resource Estimates: Memory=66MB  
  |
|   
  |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
  |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1  |
| PLAN-ROOT SINK
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| | 
  |
| 04:EXCHANGE [UNPARTITIONED]   
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT)  
  |
| | 
  |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
  |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2|
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] 
  |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
substr(r.c3, 1, 6)   |
| |  fk/pk conjuncts: none  
  |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT), 01(OPEN)
  |
| | 
  |
| |--03:EXCHANGE [BROADCAST]
  |
| |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0 
  |
| 

[jira] [Created] (IMPALA-7564) Conservative FK/PK join type estimation with complex equi-join conjuncts

2018-09-12 Thread bharath v (JIRA)
bharath v created IMPALA-7564:
-

 Summary: Conservative FK/PK join type estimation with complex 
equi-join conjuncts
 Key: IMPALA-7564
 URL: https://issues.apache.org/jira/browse/IMPALA-7564
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 2.12.0, Impala 2.13.0, Impala 3.1.0
Reporter: bharath v


With IMPALA-5547, we predict whether a join is an FK/PK join as follows.

{noformat}
 // Iterate over all groups of conjuncts that belong to the same joined tuple 
id pair.
// For each group, we compute the join NDV of the rhs slots and compare it 
to the
// number of rows in the rhs table.
for (List fkPkCandidate: 
scanSlotsByJoinedTids.values()) {
  double jointNdv = 1.0;
  for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
slots.rhsNdv();
  double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
  if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
FK_PK_MAX_STATS_DELTA_PERC))) {
// We cannot disprove that the RHS is a PK.
if (result == null) result = Lists.newArrayList();
result.addAll(fkPkCandidate);
  }
}
{noformat}

We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
their NDVs and check if it close to rhsNumRows. The issue here is that this can 
result in conservative FK/Pk detection if the equi-join conjuncts are not 
simple (of the form  = )

{noformat}
/**
 * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
or null if
 * the given conjunct is not of the form  =  or if the 
underlying
 * table/column of at least one side is missing stats.
 */
public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
  if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
  SlotDescriptor lhsScanSlot = eqJoinConjunct.getChild(0).findSrcScanSlot();
  if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
null;
  SlotDescriptor rhsScanSlot = eqJoinConjunct.getChild(1).findSrcScanSlot();
{noformat}

For example, the following query contains a complex equi-join conjunct 
{{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
outer join is an FK/PK, we just check if 
{{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. 

{noformat}
[localhost:21000]> explain select * from test_left l left outer join test_right 
r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6);
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
+-+
| Explain String
  |
+-+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5
  |
| Per-Host Resource Estimates: Memory=66MB  
  |
|   
  |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
  |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1  |
| PLAN-ROOT SINK
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| | 
  |
| 04:EXCHANGE [UNPARTITIONED]   
  |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0
  |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT)  
  |
| | 
  |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
  |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2|
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] 
  |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
substr(r.c3, 1, 6)   |
| |  fk/pk conjuncts: none  
  |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023
  |
| |  in pipelines: 00(GETNEXT), 01(OPEN)
  |
| |

[jira] [Resolved] (IMPALA-7426) T-test is an unreliable method for comparing non-normal distributions

2018-09-12 Thread Jim Apple (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple resolved IMPALA-7426.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

https://github.com/apache/impala/commit/cb26d8d82880e934aae9ef211ca7752de510e84a

> T-test is an unreliable method for comparing non-normal distributions
> -
>
> Key: IMPALA-7426
> URL: https://issues.apache.org/jira/browse/IMPALA-7426
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> https://en.wikipedia.org/wiki/Student%27s_t-test is for normally distributed 
> variables, and many Impala benchmarks won't be formally distributed. In 
> particular, none of them will have negative run times.
> We should consider https://en.wikipedia.org/wiki/Nonparametric_statistics 
> like 
> https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Comparison_to_Student's_t-test
>  in report_benchmark_results.py.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7426) T-test is an unreliable method for comparing non-normal distributions

2018-09-12 Thread Jim Apple (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple resolved IMPALA-7426.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

https://github.com/apache/impala/commit/cb26d8d82880e934aae9ef211ca7752de510e84a

> T-test is an unreliable method for comparing non-normal distributions
> -
>
> Key: IMPALA-7426
> URL: https://issues.apache.org/jira/browse/IMPALA-7426
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> https://en.wikipedia.org/wiki/Student%27s_t-test is for normally distributed 
> variables, and many Impala benchmarks won't be formally distributed. In 
> particular, none of them will have negative run times.
> We should consider https://en.wikipedia.org/wiki/Nonparametric_statistics 
> like 
> https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Comparison_to_Student's_t-test
>  in report_benchmark_results.py.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-7426) T-test is an unreliable method for comparing non-normal distributions

2018-09-12 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612744#comment-16612744
 ] 

ASF subversion and git services commented on IMPALA-7426:
-

Commit cb26d8d82880e934aae9ef211ca7752de510e84a in impala's branch 
refs/heads/master from [~jbapple]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=cb26d8d ]

IMPALA-7426: Use Mann-Whitney U to compare benchmarks

The Mann-Whitney test can be used to compare samples taken from
non-normal distributions, and so can more accurately reflect
performance changes than a T-test. This patch does not remove t-tests
from the benchmark reporting, it just supplements them by including
the Mann-Whitney test result as well.

Change-Id: I8d6631ebeba1422b832def5cd68537624f672fa0
Reviewed-on: http://gerrit.cloudera.org:8080/11194
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins 


> T-test is an unreliable method for comparing non-normal distributions
> -
>
> Key: IMPALA-7426
> URL: https://issues.apache.org/jira/browse/IMPALA-7426
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Major
>
> https://en.wikipedia.org/wiki/Student%27s_t-test is for normally distributed 
> variables, and many Impala benchmarks won't be formally distributed. In 
> particular, none of them will have negative run times.
> We should consider https://en.wikipedia.org/wiki/Nonparametric_statistics 
> like 
> https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Comparison_to_Student's_t-test
>  in report_benchmark_results.py.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7563) Hive runner runs hive in local.. Any similar thing related to this which runs impala in local

2018-09-12 Thread Jim Apple (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple resolved IMPALA-7563.
---
Resolution: Not A Bug

Please use 
[d...@impala.apache.org|https://mail-archives.apache.org/mod_mbox/impala-dev/] 
for questions about developing Impala and 
[u...@impala.apache.org|https://mail-archives.apache.org/mod_mbox/impala-user/] 
for questions about using Impala.

> Hive runner runs hive in local.. Any similar thing related to this which runs 
> impala in local
> -
>
> Key: IMPALA-7563
> URL: https://issues.apache.org/jira/browse/IMPALA-7563
> Project: IMPALA
>  Issue Type: Question
>  Components: Backend, Infrastructure
>Reporter: Naga Venkata Giridhar
>Priority: Major
>
> Hive runner runs hive in local.. Any similar thing related to hive runner 
> which runs impala in local.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7563) Hive runner runs hive in local.. Any similar thing related to this which runs impala in local

2018-09-12 Thread Jim Apple (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple resolved IMPALA-7563.
---
Resolution: Not A Bug

Please use 
[d...@impala.apache.org|https://mail-archives.apache.org/mod_mbox/impala-dev/] 
for questions about developing Impala and 
[u...@impala.apache.org|https://mail-archives.apache.org/mod_mbox/impala-user/] 
for questions about using Impala.

> Hive runner runs hive in local.. Any similar thing related to this which runs 
> impala in local
> -
>
> Key: IMPALA-7563
> URL: https://issues.apache.org/jira/browse/IMPALA-7563
> Project: IMPALA
>  Issue Type: Question
>  Components: Backend, Infrastructure
>Reporter: Naga Venkata Giridhar
>Priority: Major
>
> Hive runner runs hive in local.. Any similar thing related to hive runner 
> which runs impala in local.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-5031) UBSAN clean and method for testing UBSAN cleanliness

2018-09-12 Thread Jim Apple (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612526#comment-16612526
 ] 

Jim Apple commented on IMPALA-5031:
---

A note about implementation-defined behavior, which is preferable to undefined 
behavior:

GCC documents its implementation-defined behavior here: 
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation.
 Clang does not document its implementation-defined behavior yet: 
https://bugs.llvm.org/show_bug.cgi?id=11272.

> UBSAN clean and method for testing UBSAN cleanliness
> 
>
> Key: IMPALA-5031
> URL: https://issues.apache.org/jira/browse/IMPALA-5031
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend, Infrastructure
>Affects Versions: Impala 2.9.0
>Reporter: Jim Apple
>Assignee: Jim Apple
>Priority: Minor
>
> http://releases.llvm.org/3.8.0/tools/clang/docs/UndefinedBehaviorSanitizer.html
>  builds are supported after https://gerrit.cloudera.org/#/c/6186/, but 
> Impala's test suite triggers many errors under UBSAN. Those errors should be 
> fixed and then there should be a way to run the test suite under UBSAN and 
> fail if there were any errors detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7563) Hive runner runs hive in local.. Any similar thing related to this which runs impala in local

2018-09-12 Thread Naga Venkata Giridhar (JIRA)
Naga Venkata Giridhar created IMPALA-7563:
-

 Summary: Hive runner runs hive in local.. Any similar thing 
related to this which runs impala in local
 Key: IMPALA-7563
 URL: https://issues.apache.org/jira/browse/IMPALA-7563
 Project: IMPALA
  Issue Type: Question
  Components: Backend, Infrastructure
Reporter: Naga Venkata Giridhar


Hive runner runs hive in local.. Any similar thing related to hive runner which 
runs impala in local.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7563) Hive runner runs hive in local.. Any similar thing related to this which runs impala in local

2018-09-12 Thread Naga Venkata Giridhar (JIRA)
Naga Venkata Giridhar created IMPALA-7563:
-

 Summary: Hive runner runs hive in local.. Any similar thing 
related to this which runs impala in local
 Key: IMPALA-7563
 URL: https://issues.apache.org/jira/browse/IMPALA-7563
 Project: IMPALA
  Issue Type: Question
  Components: Backend, Infrastructure
Reporter: Naga Venkata Giridhar


Hive runner runs hive in local.. Any similar thing related to hive runner which 
runs impala in local.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6772) Enable test_scanners_fuzz for ORC format

2018-09-12 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612420#comment-16612420
 ] 

Tim Armstrong commented on IMPALA-6772:
---

Feel free to post a fix for native-toolchain. I won't get to look in detail 
until next week most likely.

> Enable test_scanners_fuzz for ORC format
> 
>
> Key: IMPALA-6772
> URL: https://issues.apache.org/jira/browse/IMPALA-6772
> Project: IMPALA
>  Issue Type: Test
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Currently, we haven't enabled test_scanner_fuzz for ORC yet, since the ORC 
> library (release-1.4.3) is not robust for corrupt files (ORC-315). We should 
> enable it after a new version of the ORC library is released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7559) Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps

2018-09-12 Thread Csaba Ringhofer (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612353#comment-16612353
 ] 

Csaba Ringhofer commented on IMPALA-7559:
-

Sent a fix to review: https://gerrit.cloudera.org/#/c/11431/

> Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps
> -
>
> Key: IMPALA-7559
> URL: https://issues.apache.org/jira/browse/IMPALA-7559
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: correctness, parquet, wrongresults
>
> UPDATE: the issue turned out to be different than I first thought, see my 
> last comment. I will update the description with more details later.
> If the min/max value of a timestamp column chunk is during the hour of the 
> Summer->Winter dst change (UTC+2 -> UTC+1 in CET) then stat filtering can 
> drop row groups that contain rows that would be "ok" for the predicate 
> otherwise.
> To reproduce (on current master branch):
> {code}
> 1. it is assumed that the timezone is CET and that flag 
> convert_legacy_hive_parquet_utc_timestamps is enabled
> ( export TZ=CET; bin/start-impala-cluster.py 
> --impalad_args="-convert_legacy_hive_parquet_utc_timestamps=true" )
> 2. create a table in hive and fill data in 3 inserts to create 3 files:
> create table t (i int, d timestamp) stored as parquet;
> insert into t values (1, "2017-10-29 02:30:00"), (2, "2018-10-28 02:30:00");
> insert into t values (3, "2018-10-28 02:30:00");
> insert into t values (4, "2017-10-29 02:30:00")
> 3. Query from Impala
> set num_nodes=1;
> select * from t; -- returns all 4 values (same as Hive) 
> select * from t where d = "2017-10-29 02:30:00"; -- returns 1 in Impala (Hive 
> returns 1,4)
> select * from t where d = "2018-10-28 02:30:00"; -- returns 2 in Impala (Hive 
> returns 2,3)
> profile; -- NumStatsFilteredRowGroups: 2 (only one row group should have been 
> stat filtered)
> select * from t where d = "2018-10-28 02:30:00" or i = 5; -- returns 2 and 3 
> in Impala (same as Hive), because the "or" part disabled stat filtering
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7559) Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps

2018-09-12 Thread Csaba Ringhofer (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7559 started by Csaba Ringhofer.
---
> Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps
> -
>
> Key: IMPALA-7559
> URL: https://issues.apache.org/jira/browse/IMPALA-7559
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: correctness, parquet, wrongresults
>
> UPDATE: the issue turned out to be different than I first thought, see my 
> last comment. I will update the description with more details later.
> If the min/max value of a timestamp column chunk is during the hour of the 
> Summer->Winter dst change (UTC+2 -> UTC+1 in CET) then stat filtering can 
> drop row groups that contain rows that would be "ok" for the predicate 
> otherwise.
> To reproduce (on current master branch):
> {code}
> 1. it is assumed that the timezone is CET and that flag 
> convert_legacy_hive_parquet_utc_timestamps is enabled
> ( export TZ=CET; bin/start-impala-cluster.py 
> --impalad_args="-convert_legacy_hive_parquet_utc_timestamps=true" )
> 2. create a table in hive and fill data in 3 inserts to create 3 files:
> create table t (i int, d timestamp) stored as parquet;
> insert into t values (1, "2017-10-29 02:30:00"), (2, "2018-10-28 02:30:00");
> insert into t values (3, "2018-10-28 02:30:00");
> insert into t values (4, "2017-10-29 02:30:00")
> 3. Query from Impala
> set num_nodes=1;
> select * from t; -- returns all 4 values (same as Hive) 
> select * from t where d = "2017-10-29 02:30:00"; -- returns 1 in Impala (Hive 
> returns 1,4)
> select * from t where d = "2018-10-28 02:30:00"; -- returns 2 in Impala (Hive 
> returns 2,3)
> profile; -- NumStatsFilteredRowGroups: 2 (only one row group should have been 
> stat filtered)
> select * from t where d = "2018-10-28 02:30:00" or i = 5; -- returns 2 and 3 
> in Impala (same as Hive), because the "or" part disabled stat filtering
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6772) Enable test_scanners_fuzz for ORC format

2018-09-12 Thread Quanlong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612138#comment-16612138
 ] 

Quanlong Huang commented on IMPALA-6772:


The new version of ORC lib adds support to read files in HDFS, so it depends on 
hadoop2. We need to add '-DBUILD_LIBHDFSPP=off' cmake option in building orc 
lib until ORC-400 is fixed. However, there're still some other errors to fix 
like
{code}
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv1.cc.o): requires 
unsupported dynamic reloc 11; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv1.cc.o): requires 
unsupported dynamic reloc 11; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv1.cc.o): requires 
dynamic R_X86_64_PC32 reloc against '__cxa_allocate_exception' which may 
overflow at runtime; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv1.cc.o): requires 
unsupported dynamic reloc 11; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv1.cc.o): requires 
dynamic R_X86_64_32 reloc which may overflow at runtime; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv1.cc.o): requires 
dynamic R_X86_64_32 reloc which may overflow at runtime; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv2.cc.o): requires 
dynamic R_X86_64_PC32 reloc against '_ZN3orc16PositionProvider4nextEv' which 
may overflow at runtime; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv2.cc.o): requires 
dynamic R_X86_64_32 reloc which may overflow at runtime; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv2.cc.o): requires 
unsupported dynamic reloc 11; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv2.cc.o): requires 
unsupported dynamic reloc 11; recompile with -fPIC
/mnt/volume1/impala-orc/incubator-impala/toolchain/binutils-2.26.1/bin/ld.gold: 
error: ../../../toolchain/orc-1.5.2-p1/lib/liborc.a(RLEv2.cc.o): requires 
dynamic R_X86_64_32 reloc which may overflow at runtime; recompile with -fPIC
{code}

> Enable test_scanners_fuzz for ORC format
> 
>
> Key: IMPALA-6772
> URL: https://issues.apache.org/jira/browse/IMPALA-6772
> Project: IMPALA
>  Issue Type: Test
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Currently, we haven't enabled test_scanner_fuzz for ORC yet, since the ORC 
> library (release-1.4.3) is not robust for corrupt files (ORC-315). We should 
> enable it after a new version of the ORC library is released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7559) Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps

2018-09-12 Thread Csaba Ringhofer (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-7559:
---

Assignee: Csaba Ringhofer

> Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps
> -
>
> Key: IMPALA-7559
> URL: https://issues.apache.org/jira/browse/IMPALA-7559
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: correctness, parquet, wrongresults
>
> UPDATE: the issue turned out to be different than I first thought, see my 
> last comment. I will update the description with more details later.
> If the min/max value of a timestamp column chunk is during the hour of the 
> Summer->Winter dst change (UTC+2 -> UTC+1 in CET) then stat filtering can 
> drop row groups that contain rows that would be "ok" for the predicate 
> otherwise.
> To reproduce (on current master branch):
> {code}
> 1. it is assumed that the timezone is CET and that flag 
> convert_legacy_hive_parquet_utc_timestamps is enabled
> ( export TZ=CET; bin/start-impala-cluster.py 
> --impalad_args="-convert_legacy_hive_parquet_utc_timestamps=true" )
> 2. create a table in hive and fill data in 3 inserts to create 3 files:
> create table t (i int, d timestamp) stored as parquet;
> insert into t values (1, "2017-10-29 02:30:00"), (2, "2018-10-28 02:30:00");
> insert into t values (3, "2018-10-28 02:30:00");
> insert into t values (4, "2017-10-29 02:30:00")
> 3. Query from Impala
> set num_nodes=1;
> select * from t; -- returns all 4 values (same as Hive) 
> select * from t where d = "2017-10-29 02:30:00"; -- returns 1 in Impala (Hive 
> returns 1,4)
> select * from t where d = "2018-10-28 02:30:00"; -- returns 2 in Impala (Hive 
> returns 2,3)
> profile; -- NumStatsFilteredRowGroups: 2 (only one row group should have been 
> stat filtered)
> select * from t where d = "2018-10-28 02:30:00" or i = 5; -- returns 2 and 3 
> in Impala (same as Hive), because the "or" part disabled stat filtering
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7559) Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps

2018-09-12 Thread Csaba Ringhofer (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612104#comment-16612104
 ] 

Csaba Ringhofer edited comment on IMPALA-7559 at 9/12/18 1:05 PM:
--

Yet another update on this: the issue only occurs if all values in the row 
group are equal. The reason is that normally parquet-mr does not write 
timestamp statistics for int96 timestamps, because it considers the ordering 
undefined. The case when min==max is an exception, because ordering doesn't 
matter in this case. This logic is at 
https://github.com/apache/parquet-mr/blob/b4198be200e7e2df82bc9a18d54c8cd16aa156ac/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L571.

Currently Impala only does utc->local conversion if the Parquet file was 
written by parquet-mr (and convert_legacy_hive_parquet_utc_timestamps is true), 
so the issue only occurs in this specific case.
Parquet-mr writes statistics that are actually used by Impala only since 
PARQUET-1025, so the issue occurs only with relatively new Parquet-mr and any 
Impala that uses Parquet stats.


was (Author: csringhofer):
Yet another update on this: the issue only occurs if all values in the row 
group are equal. The reason is that normally parquet-mr does not write 
timestamp statistics for int96 timestamps, because it considers the ordering 
undefined. The case when min==max is an exception, because ordering doesn't 
matter in this case. This logic is at 
https://github.com/apache/parquet-mr/blob/b4198be200e7e2df82bc9a18d54c8cd16aa156ac/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L571.

Currently Impala only does utc->local conversion if the Parquet file was 
written by parquet-mr (and convert_legacy_hive_parquet_utc_timestamps is true), 
so the issue only occurs in this specific case.
Parquet-mr writes statistics that actually used by Impala only since 
PARQUET-1025, so the issue occurs only with relatively new Parquet-mr and any 
Impala that uses Parquet stats.

> Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps
> -
>
> Key: IMPALA-7559
> URL: https://issues.apache.org/jira/browse/IMPALA-7559
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Blocker
>  Labels: correctness, parquet, wrongresults
>
> UPDATE: the issue turned out to be different than I first thought, see my 
> last comment. I will update the description with more details later.
> If the min/max value of a timestamp column chunk is during the hour of the 
> Summer->Winter dst change (UTC+2 -> UTC+1 in CET) then stat filtering can 
> drop row groups that contain rows that would be "ok" for the predicate 
> otherwise.
> To reproduce (on current master branch):
> {code}
> 1. it is assumed that the timezone is CET and that flag 
> convert_legacy_hive_parquet_utc_timestamps is enabled
> ( export TZ=CET; bin/start-impala-cluster.py 
> --impalad_args="-convert_legacy_hive_parquet_utc_timestamps=true" )
> 2. create a table in hive and fill data in 3 inserts to create 3 files:
> create table t (i int, d timestamp) stored as parquet;
> insert into t values (1, "2017-10-29 02:30:00"), (2, "2018-10-28 02:30:00");
> insert into t values (3, "2018-10-28 02:30:00");
> insert into t values (4, "2017-10-29 02:30:00")
> 3. Query from Impala
> set num_nodes=1;
> select * from t; -- returns all 4 values (same as Hive) 
> select * from t where d = "2017-10-29 02:30:00"; -- returns 1 in Impala (Hive 
> returns 1,4)
> select * from t where d = "2018-10-28 02:30:00"; -- returns 2 in Impala (Hive 
> returns 2,3)
> profile; -- NumStatsFilteredRowGroups: 2 (only one row group should have been 
> stat filtered)
> select * from t where d = "2018-10-28 02:30:00" or i = 5; -- returns 2 and 3 
> in Impala (same as Hive), because the "or" part disabled stat filtering
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7559) Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps

2018-09-12 Thread Csaba Ringhofer (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612104#comment-16612104
 ] 

Csaba Ringhofer commented on IMPALA-7559:
-

Yet another update on this: the issue only occurs if all values in the row 
group are equal. The reason is that normally parquet-mr does not write 
timestamp statistics for int96 timestamps, because it considers the ordering 
undefined. The case when min==max is an exception, because ordering doesn't 
matter in this case. This logic is at 
https://github.com/apache/parquet-mr/blob/b4198be200e7e2df82bc9a18d54c8cd16aa156ac/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L571.

Currently Impala only does utc->local conversion if the Parquet file was 
written by parquet-mr (and convert_legacy_hive_parquet_utc_timestamps is true), 
so the issue only occurs in this specific case.
Parquet-mr writes statistics that actually used by Impala only since 
PARQUET-1025, so the issue occurs only with relatively new Parquet-mr and any 
Impala that uses Parquet stats.

> Parquet stat filtering ignores convert_legacy_hive_parquet_utc_timestamps
> -
>
> Key: IMPALA-7559
> URL: https://issues.apache.org/jira/browse/IMPALA-7559
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Blocker
>  Labels: correctness, parquet, wrongresults
>
> UPDATE: the issue turned out to be different than I first thought, see my 
> last comment. I will update the description with more details later.
> If the min/max value of a timestamp column chunk is during the hour of the 
> Summer->Winter dst change (UTC+2 -> UTC+1 in CET) then stat filtering can 
> drop row groups that contain rows that would be "ok" for the predicate 
> otherwise.
> To reproduce (on current master branch):
> {code}
> 1. it is assumed that the timezone is CET and that flag 
> convert_legacy_hive_parquet_utc_timestamps is enabled
> ( export TZ=CET; bin/start-impala-cluster.py 
> --impalad_args="-convert_legacy_hive_parquet_utc_timestamps=true" )
> 2. create a table in hive and fill data in 3 inserts to create 3 files:
> create table t (i int, d timestamp) stored as parquet;
> insert into t values (1, "2017-10-29 02:30:00"), (2, "2018-10-28 02:30:00");
> insert into t values (3, "2018-10-28 02:30:00");
> insert into t values (4, "2017-10-29 02:30:00")
> 3. Query from Impala
> set num_nodes=1;
> select * from t; -- returns all 4 values (same as Hive) 
> select * from t where d = "2017-10-29 02:30:00"; -- returns 1 in Impala (Hive 
> returns 1,4)
> select * from t where d = "2018-10-28 02:30:00"; -- returns 2 in Impala (Hive 
> returns 2,3)
> profile; -- NumStatsFilteredRowGroups: 2 (only one row group should have been 
> stat filtered)
> select * from t where d = "2018-10-28 02:30:00" or i = 5; -- returns 2 and 3 
> in Impala (same as Hive), because the "or" part disabled stat filtering
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7562) Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Unknown.

2018-09-12 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-7562.
---
Resolution: Not A Bug

You would probably have to recreate your JDBC connections and implement retry 
logic within your app.
Resolving since Impala itself is working fine, things recover once you restart 
the web application.

> Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.
> 
>
> Key: IMPALA-7562
> URL: https://issues.apache.org/jira/browse/IMPALA-7562
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.12.0
> Environment: centOS 7
>Reporter: ruiliang
>Priority: Major
>  Labels: impala, impala_jdbc
> Attachments: ecliseDeubgCosnle.log
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
>  
> I encountered a very strange problem. Spring boot configured impala JDBC 
> query. Under normal circumstances, all SQL queries are normal, and the 
> statement has no problem.However, after I restart the impala service, this 
> exception will be reported, or the same error code will be reported for all 
> SQL queries. After I have to restart my spring boot web service, all the 
> queries will be normal again, as I have tried many times.I looked through the 
> server logs and found that I did not seem to have received the request.But 
> why is this problem so low?Is it driven?My impala is three nodes built under 
> CDH.
> I really can not find out the reason, please help to look.thank you
>  
>  ClouderaImpalaJDBC41-2.6.4.1005.zip
> ImpalaJDBC41.jar
> {code:java}
> //代码占位符
> spring.secondary-datasource.type=com.cloudera.impala.jdbc41.Driver
> datasource.url=jdbc:impala://39.108.9.1:21050/ADM_DB;AuthMech=0;LogLevel=5;LogPath=d:\\temp;
> spring.secondary-datasource.druid.initialSize=2
> spring.secondary-datasource.druid.minIdle=2
> spring.secondary-datasource.druid.maxActive=30
> {code}
>  
>  
>  
> {code:java}
> //代码占位符
> Resolving exception from handler [public com.jx.data.biz.bean.ResultBean 
> com.jx.data.biz.distribution.web.DistributionAnalysisController.show(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse,com.jx.data.biz.distribution.bean.DistributionAnalysisBean)]:
>  org.springframework.dao.DataAccessResourceFailureException: 
> PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL 
> >=1 and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN 
> METRIC_VAL >=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE 
> WHEN METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
> A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and 
> BUS_DT<='2018-09-11' and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY 
> BUS_DT order by BUS_DT asc ]; [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.; 
> nested exception is java.sql.SQLException: 
> [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to 
> connect to server. Reason: Unknown.
> 2018-09-12 14:59:40.035 DEBUG [jx-data-analysis,,,] 10628 --- 
> [nio-9005-exec-2] o.s.web.servlet.DispatcherServlet : Could not complete 
> request
> Could not complete request
> org.springframework.dao.DataAccessResourceFailureException: 
> PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL 
> >=1 and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN 
> METRIC_VAL >=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE 
> WHEN METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
> A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and 
> BUS_DT<='2018-09-11' and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY 
> BUS_DT order by BUS_DT asc ]; [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.; 
> nested exception is java.sql.SQLException: 
> [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to 
> connect to server. Reason: Unknown.
> at 
> org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
> at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
> at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]

[jira] [Resolved] (IMPALA-7562) Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Unknown.

2018-09-12 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-7562.
---
Resolution: Not A Bug

You would probably have to recreate your JDBC connections and implement retry 
logic within your app.
Resolving since Impala itself is working fine, things recover once you restart 
the web application.

> Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.
> 
>
> Key: IMPALA-7562
> URL: https://issues.apache.org/jira/browse/IMPALA-7562
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.12.0
> Environment: centOS 7
>Reporter: ruiliang
>Priority: Major
>  Labels: impala, impala_jdbc
> Attachments: ecliseDeubgCosnle.log
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
>  
> I encountered a very strange problem. Spring boot configured impala JDBC 
> query. Under normal circumstances, all SQL queries are normal, and the 
> statement has no problem.However, after I restart the impala service, this 
> exception will be reported, or the same error code will be reported for all 
> SQL queries. After I have to restart my spring boot web service, all the 
> queries will be normal again, as I have tried many times.I looked through the 
> server logs and found that I did not seem to have received the request.But 
> why is this problem so low?Is it driven?My impala is three nodes built under 
> CDH.
> I really can not find out the reason, please help to look.thank you
>  
>  ClouderaImpalaJDBC41-2.6.4.1005.zip
> ImpalaJDBC41.jar
> {code:java}
> //代码占位符
> spring.secondary-datasource.type=com.cloudera.impala.jdbc41.Driver
> datasource.url=jdbc:impala://39.108.9.1:21050/ADM_DB;AuthMech=0;LogLevel=5;LogPath=d:\\temp;
> spring.secondary-datasource.druid.initialSize=2
> spring.secondary-datasource.druid.minIdle=2
> spring.secondary-datasource.druid.maxActive=30
> {code}
>  
>  
>  
> {code:java}
> //代码占位符
> Resolving exception from handler [public com.jx.data.biz.bean.ResultBean 
> com.jx.data.biz.distribution.web.DistributionAnalysisController.show(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse,com.jx.data.biz.distribution.bean.DistributionAnalysisBean)]:
>  org.springframework.dao.DataAccessResourceFailureException: 
> PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL 
> >=1 and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN 
> METRIC_VAL >=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE 
> WHEN METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
> A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and 
> BUS_DT<='2018-09-11' and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY 
> BUS_DT order by BUS_DT asc ]; [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.; 
> nested exception is java.sql.SQLException: 
> [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to 
> connect to server. Reason: Unknown.
> 2018-09-12 14:59:40.035 DEBUG [jx-data-analysis,,,] 10628 --- 
> [nio-9005-exec-2] o.s.web.servlet.DispatcherServlet : Could not complete 
> request
> Could not complete request
> org.springframework.dao.DataAccessResourceFailureException: 
> PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL 
> >=1 and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN 
> METRIC_VAL >=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE 
> WHEN METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
> A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and 
> BUS_DT<='2018-09-11' and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY 
> BUS_DT order by BUS_DT asc ]; [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.; 
> nested exception is java.sql.SQLException: 
> [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to 
> connect to server. Reason: Unknown.
> at 
> org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
> at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
> at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]

[jira] [Updated] (IMPALA-7562) Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Unknown.

2018-09-12 Thread ruiliang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ruiliang updated IMPALA-7562:
-
Description: 
 

I encountered a very strange problem. Spring boot configured impala JDBC query. 
Under normal circumstances, all SQL queries are normal, and the statement has 
no problem.However, after I restart the impala service, this exception will be 
reported, or the same error code will be reported for all SQL queries. After I 
have to restart my spring boot web service, all the queries will be normal 
again, as I have tried many times.I looked through the server logs and found 
that I did not seem to have received the request.But why is this problem so 
low?Is it driven?My impala is three nodes built under CDH.

I really can not find out the reason, please help to look.thank you

 

 ClouderaImpalaJDBC41-2.6.4.1005.zip

ImpalaJDBC41.jar
{code:java}
//代码占位符
spring.secondary-datasource.type=com.cloudera.impala.jdbc41.Driver
datasource.url=jdbc:impala://39.108.9.1:21050/ADM_DB;AuthMech=0;LogLevel=5;LogPath=d:\\temp;
spring.secondary-datasource.druid.initialSize=2
spring.secondary-datasource.druid.minIdle=2
spring.secondary-datasource.druid.maxActive=30
{code}
 

 

 
{code:java}
//代码占位符

Resolving exception from handler [public com.jx.data.biz.bean.ResultBean 
com.jx.data.biz.distribution.web.DistributionAnalysisController.show(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse,com.jx.data.biz.distribution.bean.DistributionAnalysisBean)]:
 org.springframework.dao.DataAccessResourceFailureException: 
PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL >=1 
and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN METRIC_VAL 
>=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE WHEN 
METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and BUS_DT<='2018-09-11' 
and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY BUS_DT order by BUS_DT 
asc ]; [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed 
to connect to server. Reason: Unknown.; nested exception is 
java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link 
failure. Failed to connect to server. Reason: Unknown.
2018-09-12 14:59:40.035 DEBUG [jx-data-analysis,,,] 10628 --- [nio-9005-exec-2] 
o.s.web.servlet.DispatcherServlet : Could not complete request
Could not complete request

org.springframework.dao.DataAccessResourceFailureException: 
PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL >=1 
and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN METRIC_VAL 
>=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE WHEN 
METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and BUS_DT<='2018-09-11' 
and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY BUS_DT order by BUS_DT 
asc ]; [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed 
to connect to server. Reason: Unknown.; nested exception is 
java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link 
failure. Failed to connect to server. Reason: Unknown.
at 
org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at 
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at 
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at 
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:649) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:684) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:716) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:726) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:776) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at 
com.jx.data.biz.distribution.service.DistributionAnalysisService.getData(DistributionAnalysisService.java:189)
 ~[classes/:na]
at 
com.jx.data.biz.distribution.web.DistributionAnalysisController.show(DistributionAnalysisController.java:69)
 ~[classes/:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 

[jira] [Created] (IMPALA-7562) Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Unknown.

2018-09-12 Thread ruiliang (JIRA)
ruiliang created IMPALA-7562:


 Summary: Caused by: java.sql.SQLException: 
[Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to 
connect to server. Reason: Unknown.
 Key: IMPALA-7562
 URL: https://issues.apache.org/jira/browse/IMPALA-7562
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Affects Versions: Impala 2.12.0
 Environment: centOS 7
Reporter: ruiliang
 Attachments: ecliseDeubgCosnle.log

 

I encountered a very strange problem. Spring boot configured impala JDBC query. 
Under normal circumstances, all SQL queries are normal, and the statement has 
no problem.However, after I restart the impala service, this exception will be 
reported, or the same error code will be reported for all SQL queries. After I 
have to restart my spring boot web service, all the queries will be normal 
again, as I have tried many times.I looked through the server logs and found 
that I did not seem to have received the request.But why is this problem so 
low?Is it driven?My impala is three nodes built under CDH.

I really can not find out the reason, please help to look.thank you

 

impala_jdbc_2.6.4.1005.zip

 
{code:java}
//代码占位符
spring.secondary-datasource.type=com.cloudera.impala.jdbc41.Driver
datasource.url=jdbc:impala://39.108.9.1:21050/ADM_DB;AuthMech=0;LogLevel=5;LogPath=d:\\temp;
spring.secondary-datasource.druid.initialSize=2
spring.secondary-datasource.druid.minIdle=2
spring.secondary-datasource.druid.maxActive=30
{code}
 

 

 
{code:java}
//代码占位符

Resolving exception from handler [public com.jx.data.biz.bean.ResultBean 
com.jx.data.biz.distribution.web.DistributionAnalysisController.show(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse,com.jx.data.biz.distribution.bean.DistributionAnalysisBean)]:
 org.springframework.dao.DataAccessResourceFailureException: 
PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL >=1 
and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN METRIC_VAL 
>=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE WHEN 
METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and BUS_DT<='2018-09-11' 
and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY BUS_DT order by BUS_DT 
asc ]; [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed 
to connect to server. Reason: Unknown.; nested exception is 
java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link 
failure. Failed to connect to server. Reason: Unknown.
2018-09-12 14:59:40.035 DEBUG [jx-data-analysis,,,] 10628 --- [nio-9005-exec-2] 
o.s.web.servlet.DispatcherServlet : Could not complete request
Could not complete request

org.springframework.dao.DataAccessResourceFailureException: 
PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL >=1 
and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN METRIC_VAL 
>=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE WHEN 
METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and BUS_DT<='2018-09-11' 
and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY BUS_DT order by BUS_DT 
asc ]; [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed 
to connect to server. Reason: Unknown.; nested exception is 
java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link 
failure. Failed to connect to server. Reason: Unknown.
at 
org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at 
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at 
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at 
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
 ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:649) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:684) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:716) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:726) 
~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:776) 

[jira] [Commented] (IMPALA-5463) OOM during clone() causes crash in libjvm.so!java_start()

2018-09-12 Thread Antoni Ivanov (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611760#comment-16611760
 ] 

Antoni Ivanov commented on IMPALA-5463:
---

Thanks 

The metric I've been using to monitor JVM memory is 
jvm.total.peak-current-usage-bytes and jvm.total.current-usage-bytes 
And they've been under 20G most of the time (and we've set -Xmx to 32G)

Still decided to double it to 64GB. And noticed that 
jvm.ps-eden-space.peak-max-usage-bytes spiked to 21GB very fast and 
jvm.ps-old-gen.peak-max-usage-bytes spiked to 43G. And CPU is at 100% 

Other symptoms are 
 * I cannot open the Impala UI - _node:25000_ 
 *  And also cannot connect to with impala-shell to that node 
 *  Warnings in logs like _"Missing tables were not received in 12ms. Load 
request will be retried."_

 * The stack trace of highest cpu are similar (GC related) But there was also 
one like 

{quote}
Thread 2 (Thread 0x7ef2fdadc700 (LWP 9467)):
#0 0x7f042a1e1a9b in recv () from /lib64/libpthread.so.0
#1 0x01b0b3fd in apache::thrift::transport::TSocket::read(unsigned 
char*, unsigned int) ()
#2 0x01b0e663 in unsigned int 
apache::thrift::transport::readAll(apache::thrift::transport::TSocket&,
 unsigned char*, unsigned int) ()
#3 0x00b5428d in 
apache::thrift::transport::TSaslTransport::read(unsigned char*, unsigned int) ()
#4 0x01b14e87 in 
apache::thrift::transport::TBufferedTransport::readSlow(unsigned char*, 
unsigned int) ()
#5 0x0081454e in unsigned int 
apache::thrift::transport::readAll(apache::thrift::transport::TBufferBase&,
 unsigned char*, unsigned int) ()
#6 0x009ee2e1 in unsigned int 
apache::thrift::protocol::TBinaryProtocolT::readStringBody(std::string&,
 int) ()
#7 0x009ee5ce in 
apache::thrift::protocol::TVirtualProtocol,
 apache::thrift::protocol::TProtocolDefaults>::readString_virt(std::string&) ()
#8 0x00da3e26 in 
impala::TTopicItem::read(apache::thrift::protocol::TProtocol*) ()
#9 0x00da4898 in 
impala::TTopicDelta::read(apache::thrift::protocol::TProtocol*) ()
#10 0x00da6476 in 
impala::TUpdateStateRequest::read(apache::thrift::protocol::TProtocol*) ()
#11 0x00da8c9d in 
impala::StatestoreSubscriber_UpdateState_args::read(apache::thrift::protocol::TProtocol*)
 ()
#12 0x00daa3dc in 
impala::StatestoreSubscriberProcessor::process_UpdateState(int, 
apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, 
void*) ()
#13 0x00da9774 in 
impala::StatestoreSubscriberProcessor::dispatchCall(apache::thrift::protocol::TProtocol*,
 apache::thrift::protocol::TProtocol*, std::string const&, int, void*) ()
{quote}

> OOM during clone() causes crash in libjvm.so!java_start()
> -
>
> Key: IMPALA-5463
> URL: https://issues.apache.org/jira/browse/IMPALA-5463
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Lars Volker
>Priority: Critical
> Attachments: stack-trace-threads-high-cpu.txt
>
>
> Running out of memory seems to cause a crash in libjvm.so!java_start() right 
> after calling clone(). Here is the stack trace of the crashing thread from a 
> minidump.
> {noformat}
>  0  libjvm.so!PSParallelCompact::MarkAndPushClosure::do_oop(oopDesc**) + 0x86
>  1  libjvm.so!OopMapSet::all_do(frame const*, RegisterMap const*, 
> OopClosure*, void (*)(oopDesc**, oopDesc**), OopClosure*) + 0x2fb
>  2  libjvm.so!frame::oops_do_internal(OopClosure*, CLDClosure*, 
> CodeBlobClosure*, RegisterMap*, bool) + 0xa2
>  3  libjvm.so!JavaThread::oops_do(OopClosure*, CLDClosure*, CodeBlobClosure*) 
> + 0x161
>  4  libjvm.so!ThreadRootsMarkingTask::do_it(GCTaskManager*, unsigned int) + 
> 0x106
>  5  libjvm.so!GCTaskThread::run() + 0x12f
>  6  libjvm.so!java_start(Thread*) + 0x108
>  7  libpthread-2.12.so!start_thread + 0xd1
>  8  libc-2.12.so!clone + 0x6d
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org