[jira] [Created] (DRILL-3691) CTAS Memory Leak : IllegalStateException

2015-08-22 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-3691:


 Summary: CTAS Memory Leak : IllegalStateException
 Key: DRILL-3691
 URL: https://issues.apache.org/jira/browse/DRILL-3691
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Reporter: Rahul Challapalli
Assignee: Steven Phillips


git.commit.id.abbrev=55dfd0e

The below CTAS statement fails with a memory leak. The query runs on top of 
Tpch SF100 data.
{code}
create table lineitem as select * from dfs.`/drill/testdata/tpch100/lineitem`;
java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Failure while 
closing accountor.  Expected private and shared pools to be set to initial 
values.  However, one or more were not.  Stats are
zoneinitallocated   delta 
private 100 100 0 
shared  00  9998410176  589824.

Fragment 1:19

[Error Id: ba8fedf2-be40-4488-af2e-b6034527c943 on qa-node191.qa.lab:31010]
Aborting command set because force is false and command failed: create table 
lineitem as select * from dfs.`/drill/testdata/tpch100/lineitem`;
{code}

I attached the log file. I am not uploading the data as it is too large



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Lucene Format Plugin

2015-08-22 Thread rahul challapalli
Hi Stefan,

I was not able to make any further progress on this. Below are a list of
things to-do from a high level

1. Cleanup LuceneScanSpec : The current implementation serializes a lot of
low level state information to serialize/de-serialize lucene's
SegmentReader. This has to be changed otherwise the plugin is tightly
coupled to Lucene's implementation details
2. Serialization of Lucene Query object
3. Convert Sql filter into Lucene Query object : I just started it and made
it work in the simplest case. You can take a look at it here.

https://github.com/rchallapalli/drill/blob/lucene/contrib/format-lucene/src/main/java/org/apache/drill/exec/planner/logical/SqlFilterToLuceneQuery.java
As part of the ElasticSearch storage plugin, Andrew has converted the
sql filter to Elastic Search Query. It looks like he handled many cases. We
can leverage
this for the Lucene format plugin. Below is his code

https://github.com/aleph-zero/drill/blob/elastic/contrib/storage-elasticsearch/src/main/java/org/apache/drill/exec/store/elasticsearch/rules/PredicateAnalyzer.java
4. Currently the lucene format plugin does not work on HDFS/MaprFs. This
should be handled
5. Pushing Agg functions and Limits into the scan. (This will be an
improvement)
5. Testing

I want to work on (1) sometime next week.

- Rahul


On Sat, Aug 22, 2015 at 12:00 AM, Stefán Baxter ste...@activitystream.com
wrote:

 Hi Rahul,

 Can you elaborate a bit on the status of the Lucene plugin and what needs
 to be done before using it?

 Also let me know if there are specific things that need improving. We want
 to try to using it in our project and perhaps we can contribute something
 meaningful.

 Regards,
  -Stefan



 On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee smukher...@commvault.com
  wrote:

 Hi Rahul,

 Thanks for sharing your code. I was trying to get plugin for solr engine.
 But I thought of using solr's rest api to do the queries ,get schema
 metadata info etc.
 The goal for me is to expose a solr engine to tools like Tableau or  MS
 Excel and user can do stuff there.

 I am still very new to this and there is a learning curve. It would be
 great if you can comment/review whatever I've done so far.

 https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr

 Thanks,
 Sudip

 -Original Message-
 From: rahul challapalli [mailto:challapallira...@gmail.com]
 Sent: 10 August 2015 AM 05:21
 To: dev@drill.apache.org
 Subject: Re: Lucene Format Plugin

 Below is the link to my branch which contains the changes related to the
 format plugin.

 https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene

 Any thoughts on how to handle contributions like this which still have
 some work to be done?

 - Rahul


 On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli 
 challapallira...@gmail.com wrote:

  Thanks Jason.
 
  I want to look at the solr plugin and see where we can collaborate or
  if we already duplicated part of the effort.
 
  I still need to push a few commits. I will share the code once I get
  these changes pushed.
 
  - Rahul
 
 
 
  On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse
  altekruseja...@gmail.com
   wrote:
 
  Hey Rahul,
 
  This is really cool! Thanks for all of the time you put into writing
  this, I think we have a lot of available opportunities to reach new
  communities with efforts like this.
 
  I noticed last week another contributor opened a JIRA for a solr
  plugin, there might be a good opportunity for the two of you to join
  efforts, as I believe he likely stated working on a lucene reader as
  part of his solr work.
 
  Would you like to post a link to your work on Github or another
  public host of your code?
 
  https://issues.apache.org/jira/browse/DRILL-3585
 
  On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter
  ste...@activitystream.com
  wrote:
 
   Hi,
  
   I'm pretty new around here but I just wanted to tell you how much
   your
  work
   can benefit us. This is great!.
  
   Look forward to trying it out.
  
   Regards,
-Stefán
  
   On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli 
   challapallira...@gmail.com wrote:
  
Hello Drillers,
   
I have been working on a lucene format plugin. In its current
state,
  the
below sample query successfully searches a lucene index and
returns
  the
results.
   
select path from dfs_test.`/search-index` where
   contents='maxItemsPerBlock'
and contents = 'BlockTreeTermsIndex'
   
   
   
*High Level Overview of Current Implementation:*
   
*Parallelization:* A lucene segment is the lowest level of
parrallelization.
*Filter Pushdown:* Currently the format plugin is designed to
push the complete filter into the scan.
*Filter Evaluation:* Each condition in the filter is treated as a
  lucene
TermQuery

   
  
  http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter
  mQuery.html

and multiple conditions are joined using a BooleanQuery 

Re: New JIRA Python tool

2015-08-22 Thread Steven Phillips
The general pattern we have adopted in the Drill community is to pattern
the commit message like this:

DRILL-jira number: Description of what was fixed

As long as you follow that pattern, I don't think there are really any
other expectations for making the pull request.

On Sat, Aug 22, 2015 at 11:24 AM, Edmon Begoli ebeg...@gmail.com wrote:

 Sounds good. Two related questions:


 1. Are there any special procedures regarding the pull request, referencing
 issue in a commit messages, etc.?

 2. Once I figure out the new JIRA Python tool use, how do I submit the
 updates for the Drill contribution and patching documentation?
 Is web documentation also maintained under the repo?

 Thank you,
 Edmon

 On Saturday, August 22, 2015, Hsuan-Yi Chu hsua...@usc.edu wrote:

  Hi Edmon,
  Thanks for bringing this up. I just tried, and easy_install does not work
  on my laptop either.
 
  From my experience, for the purpose of requesting reviews/submitting
  patches, you could send pull request on github. That might be the most
  common way people are using now.
 
  For the documentation, I think update the correct information is good
 idea
  too.
 
  On Sat, Aug 22, 2015 at 10:50 AM, Edmon Begoli ebeg...@gmail.com
  javascript:; wrote:
 
   What is the suitable replacement for the JIRA Python tool
   (jira-python) still specified on the contribution web site?
   https://drill.apache.org/docs/drill-patch-review-tool/
  
   For me, easy_install is not finding jira-python library.
  
   It looks like this is the right tool:
   http://pythonhosted.org/jira/
  
   Which is installed as just jira with pip, but it looks like the setup
 for
   patch submission might be slightly different.
  
   If I am seeing this right and the new tool is needed, we should
 probably
   update the documentation (I will be happy to do so).
  
   Thanks,
   Edmon
  
 



[jira] [Created] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter

2015-08-22 Thread Mehant Baid (JIRA)
Mehant Baid created DRILL-3690:
--

 Summary: Partitioning pruning produces wrong results when there 
are nested expressions in the filter
 Key: DRILL-3690
 URL: https://issues.apache.org/jira/browse/DRILL-3690
 Project: Apache Drill
  Issue Type: Bug
Reporter: Mehant Baid
Assignee: Mehant Baid
Priority: Blocker
 Fix For: 1.2.0


Consider the following query:
select 1 from foo where dir0 not in (1994) and dir1 not in (1995);

The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 1995)))
In FindPartitionCondition we rewrite the filter to cherry pick the partition 
column conditions so the interpreter can evaluate it, however when the 
expression contains more than two levels of nesting (in this case AND(NOT(=))) 
) the expression does not get rewritten correctly. In this case the expression 
gets rewritten as: AND(=($1, 1994), =($2, 1995)). NOT is missing from the 
rewritten expression producing wrong results.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


New JIRA Python tool

2015-08-22 Thread Edmon Begoli
What is the suitable replacement for the JIRA Python tool
(jira-python) still specified on the contribution web site?
https://drill.apache.org/docs/drill-patch-review-tool/

For me, easy_install is not finding jira-python library.

It looks like this is the right tool:
http://pythonhosted.org/jira/

Which is installed as just jira with pip, but it looks like the setup for
patch submission might be slightly different.

If I am seeing this right and the new tool is needed, we should probably
update the documentation (I will be happy to do so).

Thanks,
Edmon


Re: New JIRA Python tool

2015-08-22 Thread Hsuan-Yi Chu
Hi Edmon,
Thanks for bringing this up. I just tried, and easy_install does not work
on my laptop either.

From my experience, for the purpose of requesting reviews/submitting
patches, you could send pull request on github. That might be the most
common way people are using now.

For the documentation, I think update the correct information is good idea
too.

On Sat, Aug 22, 2015 at 10:50 AM, Edmon Begoli ebeg...@gmail.com wrote:

 What is the suitable replacement for the JIRA Python tool
 (jira-python) still specified on the contribution web site?
 https://drill.apache.org/docs/drill-patch-review-tool/

 For me, easy_install is not finding jira-python library.

 It looks like this is the right tool:
 http://pythonhosted.org/jira/

 Which is installed as just jira with pip, but it looks like the setup for
 patch submission might be slightly different.

 If I am seeing this right and the new tool is needed, we should probably
 update the documentation (I will be happy to do so).

 Thanks,
 Edmon



Re: New JIRA Python tool

2015-08-22 Thread Edmon Begoli
Sounds good. Two related questions:


1. Are there any special procedures regarding the pull request, referencing
issue in a commit messages, etc.?

2. Once I figure out the new JIRA Python tool use, how do I submit the
updates for the Drill contribution and patching documentation?
Is web documentation also maintained under the repo?

Thank you,
Edmon

On Saturday, August 22, 2015, Hsuan-Yi Chu hsua...@usc.edu wrote:

 Hi Edmon,
 Thanks for bringing this up. I just tried, and easy_install does not work
 on my laptop either.

 From my experience, for the purpose of requesting reviews/submitting
 patches, you could send pull request on github. That might be the most
 common way people are using now.

 For the documentation, I think update the correct information is good idea
 too.

 On Sat, Aug 22, 2015 at 10:50 AM, Edmon Begoli ebeg...@gmail.com
 javascript:; wrote:

  What is the suitable replacement for the JIRA Python tool
  (jira-python) still specified on the contribution web site?
  https://drill.apache.org/docs/drill-patch-review-tool/
 
  For me, easy_install is not finding jira-python library.
 
  It looks like this is the right tool:
  http://pythonhosted.org/jira/
 
  Which is installed as just jira with pip, but it looks like the setup for
  patch submission might be slightly different.
 
  If I am seeing this right and the new tool is needed, we should probably
  update the documentation (I will be happy to do so).
 
  Thanks,
  Edmon