Fwd: Avro deserialization bug - 1.3-SNAPSHOT

2015-11-11 Thread Stefán Baxter
Hi,

Decided to send this to dev* as well.

Can someone please assist me with this problem of Drill distorting string
values that are read from Avro files.

Regards,
 -Stefan

-- Forwarded message --
From: Stefán Baxter 
Date: Wed, Nov 11, 2015 at 10:14 PM
Subject: Re: Avro deserialization bug - 1.3-SNAPSHOT
To: user 


Hi,

Can someone please verify that this is in fact a bug so I can rule out our
own mistakes?

We have recently moved all our logging to Avro to compensate for schema
differences in JSON that were causing various problems and our latest
release is now impeded with this.
Alternatively can someone please point me in the right direction if I was
to try to fix this myself.

Regards,
  -Stefán

On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter 
wrote:

> Thank you Kamesh.
>
> I have created https://issues.apache.org/jira/browse/DRILL-4056 with the
> description.
> I will send you a confidential test file to your private email.
>
> Regards,
>  -Stefan
>
> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh  wrote:
>
>> Hi Stefán,
>>  Could you please raise a Jira with sample schema and sample input to
>> reproduce it. I will look into this.
>>
>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter > >
>> wrote:
>>
>> > Hi,
>> >
>> > I have an Avro file that support the following data/schema:
>> >
>> > {"field":"some", "classification":{"variant":"Gæst"}}
>> >
>> > When I select 10 rows from this file I get:
>> >
>> > +-+
>> > |   EXPR$0|
>> > +-+
>> > | Gæst|
>> > | Voksen  |
>> > | Voksen  |
>> > | Invitation KIF KBH  |
>> > | Invitation KIF KBH  |
>> > | Ordinarie pris KBH  |
>> > | Ordinarie pris KBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > +-+
>> >
>> > The bug is that the field values are incorrectly de-serialized and the
>> > value from the previous row is retained if the subsequent row is
>> shorter.
>> >
>> > The sql query:
>> >
>> > "select s.classification.variant variant from dfs. as s limit 10;"
>> >
>> >
>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
>> > previous row had the value "Invitation KIF KBH".
>> >
>> > Regards,
>> >   -Stefán
>> >
>>
>>
>>
>> --
>> Kamesh.
>>
>
>


[GitHub] drill pull request: DRILL-3987: Componentize Drill, extracting vec...

2015-11-11 Thread parthchandra
Github user parthchandra commented on the pull request:

https://github.com/apache/drill/pull/250#issuecomment-155971965
  
Let's keep the discussion on DRILL-3940 out of this pull request and take 
it offline. For this patch, I see the changes as mostly benign.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4076) Unlike hive drill treats _HIVE_DEFAULT_PARTITION_ as a null value

2015-11-11 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4076:


 Summary: Unlike hive drill treats _HIVE_DEFAULT_PARTITION_ as a 
null value
 Key: DRILL-4076
 URL: https://issues.apache.org/jira/browse/DRILL-4076
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.3.0
Reporter: Rahul Challapalli
Priority: Critical


git.commit.id.abbrev=e78e286

Query From Drill : 
{code}
select * from hive.empty_lengthy_p where varchar_col is null;
+--+--+
| int_col  | varchar_col  |
+--+--+
| 3| null |
| 5| null |
| 6| null |
+--+--+
{code}

The same query from hive returns an empty data set. 

Dump of whole table from hive :
{code}
select * from empty_lengthy_p;  
OK
3   __HIVE_DEFAULT_PARTITION__
5   __HIVE_DEFAULT_PARTITION__
6   __HIVE_DEFAULT_PARTITION__
1   dhfawriuueiq dshfjklhfiue eiufhwelfhleiruhj ejfwekjlf hsjdkgfhsdjk  hjd 
hdfkh sdhg dkj hsdhg jds gsdlgd sd hjk sdjhkjdhgsdhg 
2   jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u 
irjfoiej0930j pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
7   jkdshgf jhg sdgj dlsg jsdgjgjkdhgiergergd fgjgioug8945u 
irjfoiej0930j pofkqpgogogj dogj09g djvkldsjgjgirewoie dkflvsd 
vkdvskgjiwegjwe;sdkvjsdgfdgksdjgkdjkdjgksjg sdkjgdsjg skdjggj;sdgjd sk;gjsd
4   sdjklhkhjdfgjhdfgkjhdfkjldfsgjdsfkjhdfmnb,cv
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-3987: Componentize Drill, extracting vec...

2015-11-11 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/250#issuecomment-155969401
  
I didn't see the comment on DRILL-3940. Will take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4075) Assertion Error in PruneScanRule when querying empty partitioned hive tables

2015-11-11 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4075:


 Summary: Assertion Error in PruneScanRule when querying empty 
partitioned hive tables
 Key: DRILL-4075
 URL: https://issues.apache.org/jira/browse/DRILL-4075
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization, Storage - Hive
Affects Versions: 1.3.0
Reporter: Rahul Challapalli
 Attachments: error.log

git.commit.id.abbrev=e78e286

Hive DDL :
{code}
CREATE TABLE empty_p (
int_col INT
   )
PARTITIONED BY (varchar_col STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
STORED AS TEXTFILE LOCATION "/drill/testdata/empty_p";
{code}

The above table is empty. Now the below query fails with an assertion error
{code}
explain plan for select * from hive.empty_lengthy_p where varchar_col is null;
Error: SYSTEM ERROR: AssertionError


[Error Id: d9e8c786-0ff9-4ccc-96aa-cbc7adbd7e03 on qa-node190.qa.lab:31010] 
(state=,code=0)
{code}

Logs are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-3987: Componentize Drill, extracting vec...

2015-11-11 Thread cwestin
Github user cwestin commented on the pull request:

https://github.com/apache/drill/pull/250#issuecomment-155957315
  
And I responded to the feedback on JIRA-3940, wherein I disagree with your 
comments; are you saying your word is final there?

After that one, there would be one more, which hasn't been put together 
yet, because it will want to be the alloc branch rebased (one more time?) with 
the 3940 changes (along with some of the other patches that have already gone 
in) merged. With that rebase, that one should be down to about 30 files, which 
include the core allocator changes, which haven't changed in the past few 
months.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3987: Componentize Drill, extracting vec...

2015-11-11 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/250#issuecomment-155954144
  
Hey Chris, let's move this discussion to the pull request and/or patches 
that you're talking about. It is clear that you're frustrated that your 
allocator branch hasn't yet received a +1. Let's discuss it in the context of 
the changes themselves. DRILL-3940 is mentioned above and I have provided 
feedback on that JIRA. What other patches/pull requests and JIRAs do you feel 
like are ready to be merged that we can discuss in a more concrete manner?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3987: Componentize Drill, extracting vec...

2015-11-11 Thread cwestin
Github user cwestin commented on the pull request:

https://github.com/apache/drill/pull/250#issuecomment-155948627
  
On 2.2 above: This description of events is not entirely accurate. We have 
never held up releases for the allocator work. Not once, never mind numerous 
occasions. For two releases during which it was not ready, I constructed 
special extract patches that included bug fixes only, because we believed it
would help the project. I did ask for people to hold off commits exactly 
once, for 24 hours. And once I found a problem (the rebase didn't go well), I 
took the hold off after about 18 hours.

As for merging patches in the order they are ready: the original allocator 
patch was ready about 3 months ago now. All the tests passed. The problem was 
that you refused to review it because it was too large. It was 213 files, less 
than half the size of this patch. Much like this one, there were "virtually no 
functional changes in this patch and its purpose is extremely narrow in scope." 
A large number of files had minor fixes that removed warnings, because these 
were helpful in finding bugs (biggest example: making a number of things 
AutoCloseables because then the compiler tells you when you leak resources -- 
this found a number of problems). The core changes are in about a dozen files, 
mainly the rewrite of the allocator itself.

You asked, and I pointed you at those core changes. I heard nothing back 
for a while, until you eventually asked that I break it up into smaller patches 
-- but still no comments on the allocator itself. Because of the layers of 
changes and dependencies, that was painful, but I did break things up. The 
first four were ready for review in mid-August. We exchanged messages and I 
told you where they were; When I came back from vacation in September, they had 
still not been reviewed. I made a stink at MapR, and some people finally 
reviewed them. I continued making patches (several more) which I continued to 
have a hard time getting people to review. When you finally did look at one, 
you immediately rejected it just because it didn't have its own JIRA. I 
submitted a JIRA for it within an hour of that, and it, like the others, 
continued to be ignored. I finally badgered more folks at MapR to review them.

Exactly once (not "numerous" times), at the time we ran the performance 
regression (a magical thing that only folks at MapR can do), we found that 
there is a performance penalty. I wouldn't call it a "concurrency design 
challenge." Yes, when you synchronize variables to get transacted behavior 
(memory allocation is really a balance transfer), instead of just depending on 
volatile variables that can change under your feet if you refer to them more 
than once, things are going to be slower. But you can do more.

In order to avoid this continual rebasing to track the project, I went back 
and made it possible for the new and old allocators to co-exist, so that the 
code would at least be committed while the performance penalty could be 
addressed (in order to avoid more rebasing in the future). This would also 
allow the use of the new allocator when assertions are enabled (or another 
debug flag), so that we could find any new problems; many of my rebases took 
time because I kept finding new problems introduced by ongoing work because the 
new allocator is much stricter than the old one.

You insisted that if others could just look at it, you were sure that a 
faster solution would be found; you asked for pointers to the code. I gave them 
to you. I never heard anything back. In any case, it is set up to only be used 
in debug mode for the moment, and can stay that way until someone can find a 
way to make it faster (and I explained the multiple implementations I tried 
before I finally got it down to the ~10% performance penalty it currently 
carries)

I continued breaking things down in an attempt to get them in. We are 
finally down to the last two patches.

Now I see a larger patch with the same characteristics, and it feels like 
it is being bulldozed through, even though there aren't clear benefits to 
having it. Your diagrams doesn't justify it, nor explain why it will be better. 
The new allocator, even if only used in debug mode for now, will help engineers 
avoid creating new leaks that the old allocator doesn't detect. Given some 
other suggestions I've seen to change the vector class interfaces, it seems 
like they're not stable enough to be broken out into a separate project yet; 
that will complicate changes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4074) "numFiles" in the explain plan should always reflect the files read.

2015-11-11 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4074:


 Summary: "numFiles" in the explain plan should always reflect the 
files read.
 Key: DRILL-4074
 URL: https://issues.apache.org/jira/browse/DRILL-4074
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=e78e286

When the planner decides to read a whole directory (with more than one file), 
the numFiles attribute in the explain plan output gives "1" which is wrong. It 
should instead give the total no of files within the directory.

Our extended tests have quite a few partition pruning tests which rely on this 
field in the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: select from table with options

2015-11-11 Thread Julien Le Dem
Hi,
I've been trying to enable this but it looks like in the current grammar
(before my change) you can not use table functions and EXTEND together.
That's because they are on difference branches of an | in the grammar.
So I would suggest that we treat those as two separate improvement in two
different pull requests:
 - not require table(...) to call table functions
 - allow using table functions and extend together.
Does it make sense?
Julien


On Tue, Nov 10, 2015 at 12:51 PM, Julian Hyde  wrote:

> To be clear, it should be possible to use a table function with all of
> the options -- EXTENDS clause, OVER clause, AS with alias and column
> aliases, TABLESAMPLE.
>
> I'm surprised that the parser didn't need more lookahead to choose
> between 't (x, y)' and 't (x INTEGER, y DATE)'.
>
> On Tue, Nov 10, 2015 at 12:28 PM, Julien Le Dem  wrote:
> > In the patch I just sent, probably not.
> > I will adjust it and add the corresponding test.
> >
> > On Tue, Nov 10, 2015 at 11:51 AM, Julian Hyde  wrote:
> >
> >> Can you use both together? Say
> >>
> >>   select columns
> >>   from dfs.`/path/to/myfile`(type => 'TEXT', fieldDelimiter => '|’)
> EXTEND
> >> (foo INTEGER)
> >>
> >> Julian
> >>
> >>
> >>
> >> > On Nov 10, 2015, at 10:51 AM, Julien Le Dem 
> wrote:
> >> >
> >> > I took a stab at adding the TableFunction syntax without table(...) in
> >> > Calcite.
> >> > I have verified that both the table function and extend (with or
> without
> >> > keyword) work
> >> >
> >>
> https://github.com/julienledem/calcite/commit/b18f335c49e273294c2d475e359c610aaed3da34
> >> >
> >> > These work:
> >> >
> >> > select columns from dfs.`/path/to/myfile`(type => 'TEXT',
> fieldDelimiter
> >> =>
> >> > '|')
> >> >
> >> > select columns from table(dfs.`/path/to/myfile`(type => 'TEXT',
> >> > fieldDelimiter => '|'))
> >> >
> >> > select columns from table(dfs.`/path/to/myfile`('JSON'))
> >> >
> >> > select columns from dfs.`/path/to/myfile`('JSON')
> >> >
> >> > select columns from dfs.`/path/to/myfile`(type => 'JSON')
> >> >
> >> > On Sat, Nov 7, 2015 at 5:15 PM, Jacques Nadeau 
> >> wrote:
> >> >
> >> >> Drill does implicitly what Phoenix does explicitly so I don't think
> we
> >> >> should constrain ourselves to having a union of the two syntaxes.
> >> >>
> >> >>
> >> >> That being said, I think we could make these work together... maybe.
> >> >>
> >> >> Remove the EXTENDS without keyword syntax from the grammar.
> >> >>
> >> >> Create a new sub block in the table block that requires no keyword.
> >> There
> >> >> would be two paths (and would probably require some lookahead)
> >> >>
> >> >> option 1> unnamed parameters (1,2,3)
> >> >> option 2> named parameters (a => 1, b=>2, c=> 3)
> >> >> option 3> create table field pattern (favoriteBand VARCHAR(100),
> >> >> golfHandicap INTEGER)
> >> >>
> >> >> Then we create a table function with options 1 & 2, an EXTENDS clause
> >> for
> >> >> option 3.
> >> >>
> >> >> Best of both worlds?
> >> >>
> >> >> On Sat, Nov 7, 2015 at 4:44 PM, James Taylor  >
> >> >> wrote:
> >> >>
> >> >>> Phoenix already supports columns at read-time using the syntax
> without
> >> >> the
> >> >>> EXTENDS keyword as Julian indicated:
> >> >>>   SELECT * FROM Emp (favoriteBand VARCHAR(100), golfHandicap
> INTEGER)
> >> >>>   WHERE goldHandicap < 10;
> >> >>>
> >> >>> Changing this by requiring the EXTENDS keyword would create a
> backward
> >> >>> compatibility problem.
> >> >>>
> >> >>> I think it'd be good if both of these extensions worked in Drill &
> >> >> Phoenix
> >> >>> given our Drillix initiative.
> >> >>>
> >> >>> On Sat, Nov 7, 2015 at 3:34 PM, Jacques Nadeau 
> >> >> wrote:
> >> >>>
> >>  My proposal was an a or b using the freemarker template in the
> >> grammar,
> >>  not something later.
> >> 
> >>  Actually, put another way: we may want to consider stating that we
> >> only
> >>  incorporate SQL standards in our primary grammar. Any extensions
> >> should
> >> >>> be
> >>  optional grammar. We could simply have grammar plugins in Calcite
> (the
> >> >>> same
> >>  way we plug in external things in Drill).
> >> 
> >>  Trying to get every project to agree on extensions seems like it
> may
> >> be
> >>  hard.
> >> 
> >> 
> >> 
> >>  --
> >>  Jacques Nadeau
> >>  CTO and Co-Founder, Dremio
> >> 
> >>  On Sat, Nov 7, 2015 at 2:45 PM, Julian Hyde 
> wrote:
> >> 
> >> > I can see why Jacques wants this syntax.
> >> >
> >> > However a “switch" in a grammar is a bad idea. Grammars need to be
> >> > predictable. Any variation should happen at validation time, or
> >> later.
> >> >
> >> > Also, we shouldn’t add configuration parameters as a way of
> avoiding
> >> a
> >> > tough design discussion.
> >> >
> >> > EXTENDS and eliding TABLE are both extensions to standard SQL, and
> >> >> they
> >> > are both applicable to Drill and Phoenix. I think Drill and
> Phoenix
> >> >> (by
> >> > w

Re: What is supposed to happen if SCALAR_REPLACEMENT does not work?

2015-11-11 Thread Hsuan Yi Chu
So drill does not move forward when it should have had to.

It is a bug. Thanks Jacques.

On Wed, Nov 11, 2015 at 1:11 PM, Jacques Nadeau  wrote:

> There are two expected behaviors:
>
> - Scalar replacement fails in the Drill distribution: ignore, log warning
> and move forward.
> - Scalar replacement fails in unit tests: query should fail (so we don't
> inject new problems into the codebase unexpectedly)
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Nov 11, 2015 at 12:04 PM, Hsuan Yi Chu 
> wrote:
>
> > Hi Jacques and Chris,
> >
> > If SCALAR_REPLACEMENT fails, should drill just fall back (by setting
> > SCALAR_REPLACEMENT_OPTION = try)?
> >
> > The default behavior for now is that query will just fail if
> > SCALAR_REPLACEMENT does not work.
> >
> > A case can be found at here:
> > https://issues.apache.org/jira/browse/DRILL-3854
> >
>


Re: What is supposed to happen if SCALAR_REPLACEMENT does not work?

2015-11-11 Thread Jacques Nadeau
There are two expected behaviors:

- Scalar replacement fails in the Drill distribution: ignore, log warning
and move forward.
- Scalar replacement fails in unit tests: query should fail (so we don't
inject new problems into the codebase unexpectedly)

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Nov 11, 2015 at 12:04 PM, Hsuan Yi Chu  wrote:

> Hi Jacques and Chris,
>
> If SCALAR_REPLACEMENT fails, should drill just fall back (by setting
> SCALAR_REPLACEMENT_OPTION = try)?
>
> The default behavior for now is that query will just fail if
> SCALAR_REPLACEMENT does not work.
>
> A case can be found at here:
> https://issues.apache.org/jira/browse/DRILL-3854
>


What is supposed to happen if SCALAR_REPLACEMENT does not work?

2015-11-11 Thread Hsuan Yi Chu
Hi Jacques and Chris,

If SCALAR_REPLACEMENT fails, should drill just fall back (by setting
SCALAR_REPLACEMENT_OPTION = try)?

The default behavior for now is that query will just fail if
SCALAR_REPLACEMENT does not work.

A case can be found at here:
https://issues.apache.org/jira/browse/DRILL-3854


[jira] [Resolved] (DRILL-4061) Incorrect results returned by window function query.

2015-11-11 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz resolved DRILL-4061.
---
Resolution: Not A Problem

> Incorrect results returned by window function query.
> 
>
> Key: DRILL-4061
> URL: https://issues.apache.org/jira/browse/DRILL-4061
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
> Attachments: 0_0_0.parquet
>
>
> Window function query that uses lag function returns incorrect results.
> sys.version => 3a73f098
> Drill 1.3
> Test parquet file is attached here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE testrepro AS SELECT 
> CAST(columns[0] AS INT) col0, CAST(columns[1] AS INT) col1 FROM 
> `testRepro.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 11 |
> +---++
> 1 row selected (0.542 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col1, 1 / (col1 - lag(col1) OVER (ORDER 
> BY col0)) from testrepro;
> +---+-+
> | col1  | EXPR$1  |
> +---+-+
> | 11| null|
> | 9 | 0   |
> | 0 | 0   |
> | 10| 0   |
> | 19| 0   |
> | 13| 0   |
> | 17| 0   |
> | -1| 0   |
> | 1 | 0   |
> | 20| 0   |
> | 100   | 0   |
> +---+-+
> 11 rows selected (0.451 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4073) Parquet RecordReader fails to read a file on s3

2015-11-11 Thread amit hadke (JIRA)
amit hadke created DRILL-4073:
-

 Summary: Parquet RecordReader fails to read a file on s3
 Key: DRILL-4073
 URL: https://issues.apache.org/jira/browse/DRILL-4073
 Project: Apache Drill
  Issue Type: Bug
Reporter: amit hadke


Parquet Reader fails with exception "FAILED_TO_UNCOMPRESS"
Same file can be read when copied to hdfs.

Attaching file.

Query ran:
select key1 from s3.`parquet_20m_s3r1` limit 1;

s3 was configured with s3n:// connection.

 - drillbit log --

fragment 0:0

[Error Id: 72b14374-e7f0-4507-9593-a54a883701ca on 
ip-172-31-39-61.us-west-2.compute.internal:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IOException: 
FAILED_TO_UNCOMPRESS(5)

Fragment 0:0

[Error Id: 72b14374-e7f0-4507-9593-a54a883701ca on 
ip-172-31-39-61.us-west-2.compute.internal:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
 [drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
 [drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
 [drill-java-exec-1.2.0.jar:1.2.0]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.2.0.jar:1.2.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_85]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_85]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in 
parquet record reader.
Message: 
Hadoop path: /testdata/parquet_20m_s3r1/0_0_0.parquet
Total records read: 0
Mock records read: 0
Records to read: 32768
Row group index: 0
Records in row group: 1
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
  optional binary key19 (UTF8);
  optional binary key18 (UTF8);
  optional binary key17 (UTF8);
  optional binary key16 (UTF8);
  optional binary key15 (UTF8);
  optional binary key14 (UTF8);
  optional binary key13 (UTF8);
  optional binary key12 (UTF8);
  optional binary key11 (UTF8);
  optional binary key10 (UTF8);
  optional binary key97 (UTF8);
  optional binary key96 (UTF8);
  optional binary key95 (UTF8);
  optional binary key94 (UTF8);
  optional binary key93 (UTF8);
  optional binary key92 (UTF8);
  optional binary key91 (UTF8);
  optional binary key90 (UTF8);
  optional binary key99 (UTF8);
  optional binary key98 (UTF8);
  optional binary key80 (UTF8);
  optional binary key81 (UTF8);
  optional binary key82 (UTF8);
  optional binary key83 (UTF8);
  optional binary key84 (UTF8);
  optional binary key85 (UTF8);
  optional binary key86 (UTF8);
  optional binary key87 (UTF8);
  optional binary key88 (UTF8);
  optional binary key89 (UTF8);
  optional binary key79 (UTF8);
  optional binary key78 (UTF8);
  optional binary key75 (UTF8);
  optional binary key74 (UTF8);
  optional binary key77 (UTF8);
  optional binary key76 (UTF8);
  optional binary key71 (UTF8);
  optional binary key70 (UTF8);
  optional binary key73 (UTF8);
  optional binary key72 (UTF8);
  optional binary key68 (UTF8);
  optional binary key69 (UTF8);
  optional binary key66 (UTF8);
  optional binary key67 (UTF8);
  optional binary key64 (UTF8);
  optional binary key65 (UTF8);
  optional binary key62 (UTF8);
  optional binary key63 (UTF8);
  optional binary key60 (UTF8);
  optional binary key61 (UTF8);
  optional binary key100 (UTF8);
  optional binary key59 (UTF8);
  optional binary key58 (UTF8);
  optional binary key53 (UTF8);
  optional binary key52 (UTF8);
  optional binary key51 (UTF8);
  optional binary key50 (UTF8);
  optional binary key57 (UTF8);
  optional binary key56 (UTF8);
  optional binary key55 (UTF8);
  optional binary key54 (UTF8);
  optional binary key44 (UTF8);
  optional binary key45 (UTF8);
  optional binary key46 (UTF8);
  optional binary key47 (UTF8);
  optional binary key40 (UTF8);
  optional binary key41 (UTF8);
  optional binary key42 (UTF8);
  optional binary key43 (UTF8);
  optional binary key48 (UTF8);
  optional binary key49 (UTF8);
  optional binary key9 (UTF8);
  optional binary key8 (UTF8);
  optional binary key3 (UTF8);
  optional binary key2 (UTF8);
  optional binary key1 (UTF8);
  optional binary key7 (UTF8);
  optional binary key6 (UTF8);
  optional binary key5 (UTF8);
  optional binary key4 (UTF8);
  optional binary key31 (UTF8);
  optional binary key30 (UTF8);
  optional binary key33 (UTF8);
  optional binary key32 (UTF8);
  optional binary key35 (UTF8);
  optional binary key34 (UTF8);
  optional binary key37 (UTF8);
  op

[jira] [Created] (DRILL-4072) Hive partition pruning not working with avro serde's

2015-11-11 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4072:


 Summary: Hive partition pruning not working with avro serde's
 Key: DRILL-4072
 URL: https://issues.apache.org/jira/browse/DRILL-4072
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=e78e286

The below plan indicates that partition pruning is not happening

{code}
explain plan for select * from hive.episodes_partitioned where doctor > 4;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(title=[$0], air_date=[$1], doctor=[$2], doctor_pt=[$3])
00-02Project(title=[$0], air_date=[$1], doctor=[$2], doctor_pt=[$3])
00-03  SelectionVectorRemover
00-04Filter(condition=[>($2, 4)])
00-05  Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:episodes_partitioned), 
inputSplits=[maprfs:///user/hive/warehouse/episodes_partitioned/doctor_pt=1/00_0:0+367,
 
maprfs:///user/hive/warehouse/episodes_partitioned/doctor_pt=11/00_0:0+393, 
maprfs:///user/hive/warehouse/episodes_partitioned/doctor_pt=2/00_0:0+371, 
maprfs:///user/hive/warehouse/episodes_partitioned/doctor_pt=4/00_0:0+368, 
maprfs:///user/hive/warehouse/episodes_partitioned/doctor_pt=5/00_0:0+357, 
maprfs:///user/hive/warehouse/episodes_partitioned/doctor_pt=6/00_0:0+370, 
maprfs:///user/hive/warehouse/episodes_partitioned/doctor_pt=9/00_0:0+350], 
columns=[`*`], numPartitions=7, partitions= [Partition(values:[1]), 
Partition(values:[11]), Partition(values:[2]), Partition(values:[4]), 
Partition(values:[5]), Partition(values:[6]), Partition(values:[9])]]])
{code}

I attached the data file and the hql required. Let me know if anything else is 
needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-3987: Componentize Drill, extracting vec...

2015-11-11 Thread cwestin
Github user cwestin commented on the pull request:

https://github.com/apache/drill/pull/250#issuecomment-155847895
  
I've sent you the location of the branch many times; the remaining changes 
are still there: https://github.com/cwestin/incubator-drill/tree/alloc . After 
DRILL-3940 is merged (and then rebased back into that branch), it would be down 
the last 30 files, which I don't think would be easily broken down into any 
more smaller patches. These changes collectively found a number of bugs, which 
I fixed over time. They would also prevent more of the same kind in the future. 
I don't understand how it can appear to be based on Drill 0.9 when it was 
rebased just about a month ago, on 1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-759) Drill does not support date + interval

2015-11-11 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal resolved DRILL-759.
---
Resolution: Fixed

This is now fixed.

> Drill does not support date + interval
> --
>
> Key: DRILL-759
> URL: https://issues.apache.org/jira/browse/DRILL-759
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: Future
>Reporter: Krystal
>  Labels: interval
> Fix For: Future
>
>
> Right now drill only support date addition/subtraction via date_add function. 
>  It should also support date + interval x as this method is used by other 
> databases such as postgres and oracle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Created] (DRILL-4056) Avro deserialization

2015-11-11 Thread Stefán Baxter
Hi,

I have a) confirmed this behavior with more data and latest 1.3 anb b)
submitted a test file to the Jira ticket.

This affects all string based data fetched from Avro files (at least for me)

I think this should be considered a blocker for 1.3.

Regards,
 -Stefán


On Tue, Nov 10, 2015 at 2:40 PM, Stefán Baxter (JIRA) 
wrote:

> Stefán Baxter created DRILL-4056:
> 
>
>  Summary: Avro deserialization
>  Key: DRILL-4056
>  URL: https://issues.apache.org/jira/browse/DRILL-4056
>  Project: Apache Drill
>   Issue Type: Bug
>   Components: Storage - Other
> Affects Versions: 1.3.0
>  Environment: Ubuntu 15.04 - Oracle Java
> Reporter: Stefán Baxter
>  Fix For: 1.3.0
>
>
> I have an Avro file that support the following data/schema:
> {"field":"some", "classification":{"variant":"Gæst"}}
>
> When I select 10 rows from this file I get:
> +-+
> |   EXPR$0|
> +-+
> | Gæst|
> | Voksen  |
> | Voksen  |
> | Invitation KIF KBH  |
> | Invitation KIF KBH  |
> | Ordinarie pris KBH  |
> | Ordinarie pris KBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> | Biljetter 200 krBH  |
> +-+
>
> The bug is that the field values are incorrectly de-serialized and the
> value from the previous row is retained if the subsequent row is shorter.
>
> The sql query:
> "select s.classification.variant variant from dfs. as s limit 10;"
>
> That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
> previous row had the value "Invitation KIF KBH".
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>