date:20151208

[jira] [Resolved] (DRILL-4165) IllegalStateException in MergeJoin for a query against TPC-DS data

2015-12-08 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4165.

   Resolution: Fixed
Fix Version/s: 1.4.0

> IllegalStateException in MergeJoin for a query against TPC-DS data
> --
>
> Key: DRILL-4165
> URL: https://issues.apache.org/jira/browse/DRILL-4165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: amit hadke
> Fix For: 1.4.0
>
>
> I am seeing the following on the 1.4.0 branch. 
> {noformat}
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
> ..
> 0: jdbc:drill:zk=local> select count(*) from dfs.`tpcds/store_sales` ss1, 
> dfs.`tpcds/store_sales` ss2 where ss1.ss_customer_sk = ss2.ss_customer_sk and 
> ss1.ss_store_sk = 1 and ss2.ss_store_sk = 2;
> Error: SYSTEM ERROR: IllegalStateException: Incoming batch [#55, 
> MergeJoinBatch] has size 1984616, which is beyond the limit of 65536
> Fragment 0:0
> [Error Id: 18bf00fe-52d7-4d84-97ec-b04a035afb4e on 192.168.1.103:31010]
>   (java.lang.IllegalStateException) Incoming batch [#55, MergeJoinBatch] has 
> size 1984616, which is beyond the limit of 65536
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():305
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4172) Need stop , port as startup parameters in case drill is installed as windows service

2015-12-08 Thread Sudip Mukherjee (JIRA)

Sudip Mukherjee created DRILL-4172:
--

 Summary: Need stop , port as startup parameters in case drill is 
installed as windows service
 Key: DRILL-4172
 URL: https://issues.apache.org/jira/browse/DRILL-4172
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.3.0
 Environment: Windows
Reporter: Sudip Mukherjee
 Fix For: Future


I am trying to install Drill using procrun in windows server for persistent 
service other than running batch file.
I was in need of start , stop and port parameter for the flexibility to 
start/stop the service from windows services.msc

Does it make sense to introduce these as startup params ( optional )?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Parquet pushdown filtering

2015-12-08 Thread Adam Gilmore

That makes sense, yep.  The problem is I guess with my implementation.  I
will iterate through all Parquet files and try to eliminate ones where the
filter conflicts with the statistics.  In instances where no files match
the filter, I end up with an empty set of files for the Parquet scan to
iterate through.  I suppose I could just pick the schema of the first file
or something, but that seems like a pretty messy rule.

Julien - I'd be happy to have a chat about this.  I've pretty much got the
implementation down, but need to solve a few of these little issues.


On Fri, Dec 4, 2015 at 5:22 AM, Hanifi GUNES  wrote:

> Regarding your point  #1. I guess Daniel struggled with this limitation as
> well. I merged few of his patches which addressed empty batch(no data)
> handling in various places during execution. That said, however, we still
> could not have time to develop a solid way to handle empty batches with no
> schema.
>
> *- Scan batches don't allow empty batches.  This means if a
> particular filter filters out *all* rows, we get an exception.*
> Looks to me, you are referring to no data rather than no schema here. I
> would expect graceful execution in this case. Do you mind sharing a simple
> reproduction?
>
>
> -Hanifi
>
> 2015-12-03 10:56 GMT-08:00 Julien Le Dem :
>
> > Hey Adam,
> > If you have questions about the Parquet side of things, I'm happy to
> chat.
> > Julien
> >
> > On Tue, Dec 1, 2015 at 10:20 PM, Parth Chandra 
> wrote:
> >
> > > Parquet metadata has the rowCount for every rowGroup which is also the
> > > value count for every column in the rowGroup. Isn't that what you need?
> > >
> > > On Tue, Dec 1, 2015 at 10:10 PM, Adam Gilmore 
> > > wrote:
> > >
> > > > Hi guys,
> > > >
> > > > I'm trying to (re)implement pushdown filtering for Parquet with the
> new
> > > > Parquet metadata caching implementation.
> > > >
> > > > I've run into a couple of challenges:
> > > >
> > > >1. Scan batches don't allow empty batches.  This means if a
> > particular
> > > >filter filters out *all* rows, we get an exception.  I haven't
> read
> > > the
> > > >full comments on the relevant JIRA items, but it seems odd that we
> > > can't
> > > >query an empty JSON file, for example.  This is a bit of a blocker
> > to
> > > >implement the pushdown filtering properly.
> > > >2. The Parquet metadata doesn't include all the relevant metadata.
> > > >Specifically, count of values is not included, therefore the
> default
> > > >Parquet statistics filter has issues because it compares the count
> > of
> > > >values with count of nulls to work out if it can drop it.  This
> > isn't
> > > >necessarily a blocker, but it feels ugly simulating there's "1"
> row
> > > in a
> > > >block (just to get around the null comparison).
> > > >
> > > > Also, it feels a bit ugly rehydrating the standard Parquet metadata
> > > objects
> > > > manually.  I'm not sure I understand why we created our own objects
> for
> > > the
> > > > Parquet metadata as opposed to simply writing a custom serializer for
> > > those
> > > > objects which we store.
> > > >
> > > > Thoughts would be great - I'd love to get a patch out for this.
> > > >
> > >
> >
> >
> >
> > --
> > Julien
> >
>

[jira] [Created] (DRILL-4173) Query did not return all documents if collection using a hashed shard key

2015-12-08 Thread Yuqing Tang (JIRA)

Yuqing Tang created DRILL-4173:
--

 Summary: Query did not return all documents if collection using a 
hashed shard key
 Key: DRILL-4173
 URL: https://issues.apache.org/jira/browse/DRILL-4173
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - MongoDB
Affects Versions: 1.3.0
 Environment: Windows 2012
Reporter: Yuqing Tang


MongoDB 3.0.6

If a collection used a hashed shard key ({ "shardkey": "hashed" }), queries 
like "select * from ..." may not return all documents which should be returned 
from the collection.

Test Case:
Create 3 mongos, 3 config, 3 replicaset each with 3 mongod.
Create one collection with hashed shard key.
Insert 6 documents to this collection with shard key value 1,2,3,4,5,6
Do a query select * from 
Only 2,3,4 will be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[VOTE] Release Apache Drill 1.4.0 RC1

2015-12-08 Thread Venki Korukanti

Hi,

I'd like to propose the second release candidate of Apache Drill, version
1.4.0. It covers a total of 32 resolved JIRAs [1]. Fix for MergeJoin issue
(DRILL-4165) found in RC0 is also included in RC1. Thanks to everyone who
contributed to this release.

The tarball artifacts are hosted at [2] and the maven artifacts are hosted at
[3]. This release candidate is based on commit
32b871b24c7b69f59a1d2e70f444eed6e599e825 located at [4].

The vote will be open for the next 72 hours ending at 8AM Pacific, December
11, 2015.

[ ] +1
[ ] +0
[ ] -1

Thanks
Venki

[1] 
*https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332947=12313820
*
[2] http://people.apache.org/~venki/apache-drill-1.4.0.rc1
[3] *https://repository.apache.org/content/repositories/orgapachedrill-1019/
*
[4] https://github.com/vkorukanti/drill/tree/1.4.0

Hangout starting

2015-12-08 Thread Jacques Nadeau

Hey All,

Hangout starting:

https://plus.google.com/hangouts/_/dremio.com/drillhangout?authuser=0

--
Jacques Nadeau
CTO and Co-Founder, Dremio

[GitHub] drill pull request: Drill 4127: Reduce Hive metastore client API c...

2015-12-08 Thread jinfengni

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/286#issuecomment-162982945
  
Address venki's comments in the revised patches. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Can we pass the #skipped records with RecordBatch?

2015-12-08 Thread Jacques Nadeau

Please see some initial thoughts attached. Would love feedback and thoughts
from others on how we can shape this.

https://gist.github.com/jacques-n/84b13e704e0e3829ca99

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Dec 3, 2015 at 8:17 AM, Zelaine Fong  wrote:

> Yes, it would be great to get your thoughts so we can assess the scope of
> what's involved.
>
> Thanks.
>
> -- Zelaine
>
> On Wed, Dec 2, 2015 at 7:29 PM, Jacques Nadeau  wrote:
>
> > Definitely agree that we shouldn't boil the ocean.  That said, I don't
> > think we should make RecordBatch interface changes without deliberate
> > design. Same for RPC protocol changes. Part of my internal struggle with
> > the warning patch is exactly this lack of broader design. I think this is
> > especially true given the drive to supports backwards compatibility.
> >
> > I don't think we're talking about a massive undertaking. I'll try to
> write
> > up some thoughts later this week to get the ball rolling. Sound good?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> > +1 on having a framework.
> > OTOH, as with the warnings implementation, we might want to go ahead
> with a
> > simpler implementation while we get a more generic framework design in
> > place.
> >
> > Jacques, do you have any preliminary thoughts on the framework?
> >
> > On Tue, Dec 1, 2015 at 2:08 PM, Julian Hyde  wrote:
> >
> > > +1 for a sideband mechanism.
> > >
> > > Sideband can also allow correlated restart of sub-queries.
> > >
> > > In sideband use cases you described, the messages ran in the opposite
> > > direction to the data. Would the sideband also run in the same
> direction
> > as
> > > the data? If so it could carry warnings, rejected rows, progress
> > > indications, and (for online aggregation[1]) notifications that a
> better
> > > approximate query result is available.
> > >
> > > Julian
> > >
> > > [1] https://en.wikipedia.org/wiki/Online_aggregation
> > >
> > >
> > >
> > > > On Dec 1, 2015, at 1:51 PM, Jacques Nadeau 
> wrote:
> > > >
> > > > This seems like a form of sideband communication. I think we should
> > have
> > > a
> > > > framework for this type of thing in general rather than a one-off for
> > > this
> > > > particular need. Other forms of sideband might be small table
> > bloomfilter
> > > > generation and pushdown into hbase, separate file
> > assignment/partitioning
> > > > providers balancing/generating scanner workloads, statistics
> generation
> > > for
> > > > adaptive execution, etc.
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Tue, Dec 1, 2015 at 11:35 AM, Hsuan Yi Chu 
> > > wrote:
> > > >
> > > >> I am trying to deal with the following scenario:
> > > >>
> > > >> A bunch of minor fragments are doing things in parallel. Each of
> them
> > > could
> > > >> skip some records. Since the downstream minor fragment needs to know
> > the
> > > >> sum of skipped-record-counts (in order to just display or see if the
> > > number
> > > >> exceeds the threshold) in the upstreams, each upstream minor
> fragment
> > > needs
> > > >> to pass this scalar with RecordBatch.
> > > >>
> > > >> Since this seems impacting the protocol of RecordBatch, I am looking
> > for
> > > >> some advice here.
> > > >>
> > > >> Thanks.
> > > >>
> > >
> > >
> >
>

Re: Parquet pushdown filtering

2015-12-08 Thread Julien Le Dem

Adam: do you want to schedule a hangout?

On Tue, Dec 8, 2015 at 4:59 AM, Adam Gilmore  wrote:

> That makes sense, yep.  The problem is I guess with my implementation.  I
> will iterate through all Parquet files and try to eliminate ones where the
> filter conflicts with the statistics.  In instances where no files match
> the filter, I end up with an empty set of files for the Parquet scan to
> iterate through.  I suppose I could just pick the schema of the first file
> or something, but that seems like a pretty messy rule.
>
> Julien - I'd be happy to have a chat about this.  I've pretty much got the
> implementation down, but need to solve a few of these little issues.
>
>
> On Fri, Dec 4, 2015 at 5:22 AM, Hanifi GUNES 
> wrote:
>
> > Regarding your point  #1. I guess Daniel struggled with this limitation
> as
> > well. I merged few of his patches which addressed empty batch(no data)
> > handling in various places during execution. That said, however, we still
> > could not have time to develop a solid way to handle empty batches with
> no
> > schema.
> >
> > *- Scan batches don't allow empty batches.  This means if a
> > particular filter filters out *all* rows, we get an exception.*
> > Looks to me, you are referring to no data rather than no schema here. I
> > would expect graceful execution in this case. Do you mind sharing a
> simple
> > reproduction?
> >
> >
> > -Hanifi
> >
> > 2015-12-03 10:56 GMT-08:00 Julien Le Dem :
> >
> > > Hey Adam,
> > > If you have questions about the Parquet side of things, I'm happy to
> > chat.
> > > Julien
> > >
> > > On Tue, Dec 1, 2015 at 10:20 PM, Parth Chandra 
> > wrote:
> > >
> > > > Parquet metadata has the rowCount for every rowGroup which is also
> the
> > > > value count for every column in the rowGroup. Isn't that what you
> need?
> > > >
> > > > On Tue, Dec 1, 2015 at 10:10 PM, Adam Gilmore  >
> > > > wrote:
> > > >
> > > > > Hi guys,
> > > > >
> > > > > I'm trying to (re)implement pushdown filtering for Parquet with the
> > new
> > > > > Parquet metadata caching implementation.
> > > > >
> > > > > I've run into a couple of challenges:
> > > > >
> > > > >1. Scan batches don't allow empty batches.  This means if a
> > > particular
> > > > >filter filters out *all* rows, we get an exception.  I haven't
> > read
> > > > the
> > > > >full comments on the relevant JIRA items, but it seems odd that
> we
> > > > can't
> > > > >query an empty JSON file, for example.  This is a bit of a
> blocker
> > > to
> > > > >implement the pushdown filtering properly.
> > > > >2. The Parquet metadata doesn't include all the relevant
> metadata.
> > > > >Specifically, count of values is not included, therefore the
> > default
> > > > >Parquet statistics filter has issues because it compares the
> count
> > > of
> > > > >values with count of nulls to work out if it can drop it.  This
> > > isn't
> > > > >necessarily a blocker, but it feels ugly simulating there's "1"
> > row
> > > > in a
> > > > >block (just to get around the null comparison).
> > > > >
> > > > > Also, it feels a bit ugly rehydrating the standard Parquet metadata
> > > > objects
> > > > > manually.  I'm not sure I understand why we created our own objects
> > for
> > > > the
> > > > > Parquet metadata as opposed to simply writing a custom serializer
> for
> > > > those
> > > > > objects which we store.
> > > > >
> > > > > Thoughts would be great - I'd love to get a patch out for this.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
>



-- 
Julien

[jira] [Created] (DRILL-4174) fix for DRILL-4081 mistakenly regresses the fix for DRILL-3768

2015-12-08 Thread Deneche A. Hakim (JIRA)

Deneche A. Hakim created DRILL-4174:
---

 Summary: fix for DRILL-4081 mistakenly regresses the fix for 
DRILL-3768
 Key: DRILL-4174
 URL: https://issues.apache.org/jira/browse/DRILL-4174
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.4.0
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim


As part of fixing DRILL-3768, ExternalSortBatch counts how many batches are 
held in memory in a local variable "totalBatches" to make sure sort spills to 
disk before we have too many batches for the SelectionVector4. 
The fix for DRILL-4081 removed the line that incremented totalBatches which 
causes the previous fix to no longer work





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Can we pass the #skipped records with RecordBatch?

2015-12-08 Thread Jacques Nadeau

inline

It seems that SidebandTunnel is point-to-point. That is, there is one
> producer and one consumer. No broadcast or topics (multiple consumers of
> the same message). Order is preserved. At-most-once (i.e. may lose data in
> event of failure). Producer and consumer may be on the same node or
> different nodes. Correct?
>

Yes, you are correct in all of this. Since we don't use UDP in Drill, we do
broadcast as a collection of individual p2p calls, all using the same
message (and multiple reference counts if using raw bytes).

>
> I’m not sure SidebandTunnel.close is necessary. I would presume that a
> SidebandTunnel is closed when its associated statement is closed, and only
> then.
>

I started without it. My thought was that we may need to signal that you've
gotten all of a sideband stream prior to the close of a particular
fragment. If I'm on the downside of an operation reporting multiple skips,
I may want to hold off on reporting to the user until I got all of the
messages. One option is for the sender to send a discrete message via the
Tunnel close. The other option is a implicit message when the fragment is
completed. I like the latter from a cleanliness perspective but think the
former may be required. I'm ok for not exposing at the tunnel level
publically initially and we can always expose later. I would love to hear
whether people think there is going to be a need/use case to continue
fragment operation but have another operator know that a sideband stream is
complete. Maybe when sending a downstream set of samples on the first 1mm
records of a larger scan?

> Also, would it be easier if the tunnels were defined as part of the DAG,
> and DAG initialization time was the only time that they could be created?
>

That is a really good question. I need to think about it a bit. I'm not
sure it is easier given my initial proposal is to piggy-back on the
DataTunnel, (which is independent of DAG initialization).  However, it
might be cleaner if operators have to declare this relationship at
initialization time and it is all managed 'outside'.

Thanks for the feedback. Will need to think further on your last point
especially.

>
> Julian
>
>
> > On Dec 8, 2015, at 11:00 AM, Jacques Nadeau  wrote:
> >
> > Please see some initial thoughts attached. Would love feedback and
> thoughts
> > from others on how we can shape this.
> >
> > https://gist.github.com/jacques-n/84b13e704e0e3829ca99
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Thu, Dec 3, 2015 at 8:17 AM, Zelaine Fong  wrote:
> >
> >> Yes, it would be great to get your thoughts so we can assess the scope
> of
> >> what's involved.
> >>
> >> Thanks.
> >>
> >> -- Zelaine
> >>
> >> On Wed, Dec 2, 2015 at 7:29 PM, Jacques Nadeau 
> wrote:
> >>
> >>> Definitely agree that we shouldn't boil the ocean.  That said, I don't
> >>> think we should make RecordBatch interface changes without deliberate
> >>> design. Same for RPC protocol changes. Part of my internal struggle
> with
> >>> the warning patch is exactly this lack of broader design. I think this
> is
> >>> especially true given the drive to supports backwards compatibility.
> >>>
> >>> I don't think we're talking about a massive undertaking. I'll try to
> >> write
> >>> up some thoughts later this week to get the ball rolling. Sound good?
> >>>
> >>> --
> >>> Jacques Nadeau
> >>> CTO and Co-Founder, Dremio
> >>> +1 on having a framework.
> >>> OTOH, as with the warnings implementation, we might want to go ahead
> >> with a
> >>> simpler implementation while we get a more generic framework design in
> >>> place.
> >>>
> >>> Jacques, do you have any preliminary thoughts on the framework?
> >>>
> >>> On Tue, Dec 1, 2015 at 2:08 PM, Julian Hyde  wrote:
> >>>
>  +1 for a sideband mechanism.
> 
>  Sideband can also allow correlated restart of sub-queries.
> 
>  In sideband use cases you described, the messages ran in the opposite
>  direction to the data. Would the sideband also run in the same
> >> direction
> >>> as
>  the data? If so it could carry warnings, rejected rows, progress
>  indications, and (for online aggregation[1]) notifications that a
> >> better
>  approximate query result is available.
> 
>  Julian
> 
>  [1] https://en.wikipedia.org/wiki/Online_aggregation
> 
> 
> 
> > On Dec 1, 2015, at 1:51 PM, Jacques Nadeau 
> >> wrote:
> >
> > This seems like a form of sideband communication. I think we should
> >>> have
>  a
> > framework for this type of thing in general rather than a one-off for
>  this
> > particular need. Other forms of sideband might be small table
> >>> bloomfilter
> > generation and pushdown into hbase, separate file
> >>> assignment/partitioning
> > providers balancing/generating scanner workloads, statistics
> >> generation
>  for
> >

[jira] [Created] (DRILL-4175) calcite parse sql error

2015-12-08 Thread huntersjm (JIRA)

huntersjm created DRILL-4175:


 Summary: calcite parse sql error
 Key: DRILL-4175
 URL: https://issues.apache.org/jira/browse/DRILL-4175
 Project: Apache Drill
  Issue Type: Bug
 Environment: distribution
Reporter: huntersjm


I queryed a sql just like `selelct v from table limit 1`,I get a error:

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IndexOutOfBoundsException: Index: 68, Size: 67

After debug, I found there is a bug in calcite parse:
first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
{noformat}
   registerInternal(RexInputRef.of(i, fields), false);
{noformat}
there we get RexInputRef from RexInputRef.of, and it has a method named 
createName(int idex), here NAMES is SelfPopulatingList.class. 
SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. We 
hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
We see method registerInternal
{noformat}
private RexLocalRef registerInternal(RexNode expr, boolean force) {
expr = simplify(expr);

RexLocalRef ref;
final Pair key;
if (expr instanceof RexLocalRef) {
  key = null;
  ref = (RexLocalRef) expr;
} else {
  key = RexUtil.makeKey(expr);
  ref = exprMap.get(key);
}
if (ref == null) {
  if (validating) {
validate(
expr,
exprList.size());
  }
{noformat}
Here makeKey(expr) hope to get different key, however it get same key, so 
addExpr(expr) called less, in this method
{noformat}
RexLocalRef ref;
final int index = exprList.size();
exprList.add(expr);
ref =
new RexLocalRef(
index,
expr.getType());
localRefList.add(ref);
return ref;
{noformat}
localRefList get error size, so in line 939,
{noformat}
final RexLocalRef ref = localRefList.get(index);
{noformat}
throw IndexOutOfBoundsException

bugfix:
We can't change origin code of calcite before they fix this bug, so we can init 
NAMEs in RexLocalRef on start. Just add 
{noformat}
RexInputRef.createName(2048);
{noformat}
on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4176) Dynamic Schema Discovery is not done in case of Drill- Hive

2015-12-08 Thread Devender Yadav (JIRA)

Devender Yadav  created DRILL-4176:
--

 Summary: Dynamic Schema Discovery is not done in case of Drill- 
Hive
 Key: DRILL-4176
 URL: https://issues.apache.org/jira/browse/DRILL-4176
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Devender Yadav 


I am using hive with drill.

Storage Plugin info:

{
  "type": "hive",
  "enabled": true,
  "configProps": {
"hive.metastore.uris": "",
"javax.jdo.option.ConnectionURL": 
"jdbc:mysql://localhost:3306/metastore_hive",
"javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver",
"javax.jdo.option.ConnectionUserName": "root",
"javax.jdo.option.ConnectionPassword": "root",
"hive.metastore.warehouse.dir": "/user/hive/warehouse",
"fs.default.name": "file:///",
"hive.metastore.sasl.enabled": "false"
  }
}
It's working fine for querying and all.

Then I wanted to check whether it automatically discover newly created tables 
in hive or not.

I started drill in embedded mode and used a particular database in hive using

use hive.testDB;
Here testDB is a database in Hive with tables t1 & t2. Then I queried:

show tables;
It gave me table names

t1 
t2
I created a table t3 in hive and again fired show tables; in Drill. It's still 
showing  t1 t2. After 5-10 min I fired  show tables; again and it's showing t1 
t2 t3.

I think it should show t3 immediately after adding t3 in hive.

What can be reason for this behavior and how drill is handling it internally?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4165) IllegalStateException in MergeJoin for a query against TPC-DS data

[jira] [Created] (DRILL-4172) Need stop , port as startup parameters in case drill is installed as windows service

Re: Parquet pushdown filtering

[jira] [Created] (DRILL-4173) Query did not return all documents if collection using a hashed shard key

[VOTE] Release Apache Drill 1.4.0 RC1

Hangout starting

[GitHub] drill pull request: Drill 4127: Reduce Hive metastore client API c...

Re: Can we pass the #skipped records with RecordBatch?

Re: Parquet pushdown filtering

[jira] [Created] (DRILL-4174) fix for DRILL-4081 mistakenly regresses the fix for DRILL-3768

Re: Can we pass the #skipped records with RecordBatch?

[jira] [Created] (DRILL-4175) calcite parse sql error

[jira] [Created] (DRILL-4176) Dynamic Schema Discovery is not done in case of Drill- Hive

13 matches

Site Navigation

Mail list logo

Footer information