[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-04-27 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/819
  
2 additional comments.

1. Maybe we need include the length field in MaterializedField.toString() 
for varchar type if length is set. 
2. Will a varchar(5) vs varchar(10) be regarded as schema change? The 
underneath value vector would be same. Not sure if we may see more schema 
change exception. I tried to come up with some example, but could not find one. 
It may be worthwhile to think it over. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-05 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/819
  
@jinfengni, @paul-rogers made changes after CR, Please review when possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-05 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/819
  
My understanding is Drill would hit a schema change between column with 
type of varchar(5) and varchar(10).  However, the length calculated in this PR 
is from the query metadeta (function parameter, string literal), which will not 
change across different batches. In that sense, we should not see schema change 
exception, just because of varchar length.

The following code shows that varchar(5) is different from varchar(10), 
also different from varchar without setting length. 

```java
   TypeProtos.MajorType varchar5 = TypeProtos.MajorType.newBuilder()
.setMinorType(TypeProtos.MinorType.VARCHAR)
.setMode(TypeProtos.DataMode.OPTIONAL)
.setPrecision(5)
.build();
TypeProtos.MajorType varchar10 = TypeProtos.MajorType.newBuilder()
.setMinorType(TypeProtos.MinorType.VARCHAR)
.setMode(TypeProtos.DataMode.OPTIONAL)
.setPrecision(10)
.build();
TypeProtos.MajorType varcharNoLength = TypeProtos.MajorType.newBuilder()
.setMinorType(TypeProtos.MinorType.VARCHAR)
.setMode(TypeProtos.DataMode.OPTIONAL)
.build();

System.out.println(varchar5.equals(varchar10) ? "varchar5 equals 
varchr10" : "varchar5 NOT equals varchr10" );
System.out.println(varchar5.equals(varcharNoLength) ? "varchar5 equals 
varchrNoLength" : "varchar5 NOT equals varchrNoLength" );
```

output:
```
varchar5 NOT equals varchr10
varchar5 NOT equals varchrNoLength
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-05 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/819
  
Regarding Paul's suggestion of using sampling (either 1st batch, or `n` 
batches), if the sampled length is returned to client as metadata for query 
result set, it could cause quite big problems for client. If the sampled max 
length is 5, client expects to see varchar up to 5 chars. If a new batch 
arrives with varchar(10), it would either crash the client, or make the client 
show incorrect result.  AFAIK, that's exactly what happened when Sean was 
working on 'LIMIT 0' optimization, and if there is an inconsistency of the type 
returned between 'LIMIT 0' and one returned from a regular query.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-06 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/819
  
@jinfengni 
1. Agree with the point
> However, the length calculated in this PR is from the query metadeta 
(function parameter, string literal), which will not change across different 
batches. In that sense, we should not see schema change exception, just because 
of varchar length.

2. I wanted to clarify regarding schema change, if I understood correctly 
from the code, two schemas are compared using `BatchSchema#equals` method [1] 
and when it compares two major types, precision and scale are not taken into 
account [2]. Am I correct?

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java#L101
[2] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java#L144


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-11 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/819
  
+1

The revised commit looks good to me. Thanks for taking effort to separate 
the return type inference logic from function scope logic. You may consider 
adding brief description of that change, plus the  refactoring of part of 
function code-gen logic (ValueReference etc).

  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-12 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/819
  
Squashed the commits and updated commit message to include brief 
description of the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-12 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/819
  
Three comments.

1. The equals() method for BatchSchema and MaterializedField should 
consider all fields, including precision. If it does not, that is likely a bug 
from when those fields were added. If we need a more general rule, then provide 
isSameType(), isSameMajorType() methods to be clearer.

2. It would be *VERY* useful to have a spec for this work. Trying to 
reverse engineer a spec from code changes is quite difficult. It will be hard 
for future maintainers to keep things working correctly without a spec to 
explain.

3. If/when we can extract schema hints from data, then it seems a simple 
process to feed that information into the system created here. So, we can 
tackle that as a separate discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #819: DRILL-5419: Calculate return string length for literals & ...

2017-05-12 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/819
  
I agreed with @paul-rogers  that BatchSchema and MaterializedField should 
consider all fields (I once briefly mentioned to @paul-rogers I feel there 
might be bug in the equals/hashCode methods). 

Regarding Paul's 3 point, if we can get the schema infor (varchar length) 
from data, i.e. the underneath storage plugin (RDBMS) provides such 
information, when Drill gets the batch from recordReader, then Arina's patch 
will work the same way as as the input is a string literal. In that sense, the 
work in this patch is complimentary to how we know the table column's varchar 
length. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---