[jira] [Created] (DRILL-3691) CTAS Memory Leak : IllegalStateException
Rahul Challapalli created DRILL-3691: Summary: CTAS Memory Leak : IllegalStateException Key: DRILL-3691 URL: https://issues.apache.org/jira/browse/DRILL-3691 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Reporter: Rahul Challapalli Assignee: Steven Phillips git.commit.id.abbrev=55dfd0e The below CTAS statement fails with a memory leak. The query runs on top of Tpch SF100 data. {code} create table lineitem as select * from dfs.`/drill/testdata/tpch100/lineitem`; java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Failure while closing accountor. Expected private and shared pools to be set to initial values. However, one or more were not. Stats are zoneinitallocated delta private 100 100 0 shared 00 9998410176 589824. Fragment 1:19 [Error Id: ba8fedf2-be40-4488-af2e-b6034527c943 on qa-node191.qa.lab:31010] Aborting command set because force is false and command failed: create table lineitem as select * from dfs.`/drill/testdata/tpch100/lineitem`; {code} I attached the log file. I am not uploading the data as it is too large -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Lucene Format Plugin
Hi Stefan, I was not able to make any further progress on this. Below are a list of things to-do from a high level 1. Cleanup LuceneScanSpec : The current implementation serializes a lot of low level state information to serialize/de-serialize lucene's SegmentReader. This has to be changed otherwise the plugin is tightly coupled to Lucene's implementation details 2. Serialization of Lucene Query object 3. Convert Sql filter into Lucene Query object : I just started it and made it work in the simplest case. You can take a look at it here. https://github.com/rchallapalli/drill/blob/lucene/contrib/format-lucene/src/main/java/org/apache/drill/exec/planner/logical/SqlFilterToLuceneQuery.java As part of the ElasticSearch storage plugin, Andrew has converted the sql filter to Elastic Search Query. It looks like he handled many cases. We can leverage this for the Lucene format plugin. Below is his code https://github.com/aleph-zero/drill/blob/elastic/contrib/storage-elasticsearch/src/main/java/org/apache/drill/exec/store/elasticsearch/rules/PredicateAnalyzer.java 4. Currently the lucene format plugin does not work on HDFS/MaprFs. This should be handled 5. Pushing Agg functions and Limits into the scan. (This will be an improvement) 5. Testing I want to work on (1) sometime next week. - Rahul On Sat, Aug 22, 2015 at 12:00 AM, Stefán Baxter ste...@activitystream.com wrote: Hi Rahul, Can you elaborate a bit on the status of the Lucene plugin and what needs to be done before using it? Also let me know if there are specific things that need improving. We want to try to using it in our project and perhaps we can contribute something meaningful. Regards, -Stefan On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee smukher...@commvault.com wrote: Hi Rahul, Thanks for sharing your code. I was trying to get plugin for solr engine. But I thought of using solr's rest api to do the queries ,get schema metadata info etc. The goal for me is to expose a solr engine to tools like Tableau or MS Excel and user can do stuff there. I am still very new to this and there is a learning curve. It would be great if you can comment/review whatever I've done so far. https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr Thanks, Sudip -Original Message- From: rahul challapalli [mailto:challapallira...@gmail.com] Sent: 10 August 2015 AM 05:21 To: dev@drill.apache.org Subject: Re: Lucene Format Plugin Below is the link to my branch which contains the changes related to the format plugin. https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene Any thoughts on how to handle contributions like this which still have some work to be done? - Rahul On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli challapallira...@gmail.com wrote: Thanks Jason. I want to look at the solr plugin and see where we can collaborate or if we already duplicated part of the effort. I still need to push a few commits. I will share the code once I get these changes pushed. - Rahul On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse altekruseja...@gmail.com wrote: Hey Rahul, This is really cool! Thanks for all of the time you put into writing this, I think we have a lot of available opportunities to reach new communities with efforts like this. I noticed last week another contributor opened a JIRA for a solr plugin, there might be a good opportunity for the two of you to join efforts, as I believe he likely stated working on a lucene reader as part of his solr work. Would you like to post a link to your work on Github or another public host of your code? https://issues.apache.org/jira/browse/DRILL-3585 On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter ste...@activitystream.com wrote: Hi, I'm pretty new around here but I just wanted to tell you how much your work can benefit us. This is great!. Look forward to trying it out. Regards, -Stefán On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli challapallira...@gmail.com wrote: Hello Drillers, I have been working on a lucene format plugin. In its current state, the below sample query successfully searches a lucene index and returns the results. select path from dfs_test.`/search-index` where contents='maxItemsPerBlock' and contents = 'BlockTreeTermsIndex' *High Level Overview of Current Implementation:* *Parallelization:* A lucene segment is the lowest level of parrallelization. *Filter Pushdown:* Currently the format plugin is designed to push the complete filter into the scan. *Filter Evaluation:* Each condition in the filter is treated as a lucene TermQuery http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter mQuery.html and multiple conditions are joined using a BooleanQuery
Re: New JIRA Python tool
The general pattern we have adopted in the Drill community is to pattern the commit message like this: DRILL-jira number: Description of what was fixed As long as you follow that pattern, I don't think there are really any other expectations for making the pull request. On Sat, Aug 22, 2015 at 11:24 AM, Edmon Begoli ebeg...@gmail.com wrote: Sounds good. Two related questions: 1. Are there any special procedures regarding the pull request, referencing issue in a commit messages, etc.? 2. Once I figure out the new JIRA Python tool use, how do I submit the updates for the Drill contribution and patching documentation? Is web documentation also maintained under the repo? Thank you, Edmon On Saturday, August 22, 2015, Hsuan-Yi Chu hsua...@usc.edu wrote: Hi Edmon, Thanks for bringing this up. I just tried, and easy_install does not work on my laptop either. From my experience, for the purpose of requesting reviews/submitting patches, you could send pull request on github. That might be the most common way people are using now. For the documentation, I think update the correct information is good idea too. On Sat, Aug 22, 2015 at 10:50 AM, Edmon Begoli ebeg...@gmail.com javascript:; wrote: What is the suitable replacement for the JIRA Python tool (jira-python) still specified on the contribution web site? https://drill.apache.org/docs/drill-patch-review-tool/ For me, easy_install is not finding jira-python library. It looks like this is the right tool: http://pythonhosted.org/jira/ Which is installed as just jira with pip, but it looks like the setup for patch submission might be slightly different. If I am seeing this right and the new tool is needed, we should probably update the documentation (I will be happy to do so). Thanks, Edmon
[jira] [Created] (DRILL-3690) Partitioning pruning produces wrong results when there are nested expressions in the filter
Mehant Baid created DRILL-3690: -- Summary: Partitioning pruning produces wrong results when there are nested expressions in the filter Key: DRILL-3690 URL: https://issues.apache.org/jira/browse/DRILL-3690 Project: Apache Drill Issue Type: Bug Reporter: Mehant Baid Assignee: Mehant Baid Priority: Blocker Fix For: 1.2.0 Consider the following query: select 1 from foo where dir0 not in (1994) and dir1 not in (1995); The filter condition is: AND(NOT(=($1, 1994)), NOT(=($2, 1995))) In FindPartitionCondition we rewrite the filter to cherry pick the partition column conditions so the interpreter can evaluate it, however when the expression contains more than two levels of nesting (in this case AND(NOT(=))) ) the expression does not get rewritten correctly. In this case the expression gets rewritten as: AND(=($1, 1994), =($2, 1995)). NOT is missing from the rewritten expression producing wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
New JIRA Python tool
What is the suitable replacement for the JIRA Python tool (jira-python) still specified on the contribution web site? https://drill.apache.org/docs/drill-patch-review-tool/ For me, easy_install is not finding jira-python library. It looks like this is the right tool: http://pythonhosted.org/jira/ Which is installed as just jira with pip, but it looks like the setup for patch submission might be slightly different. If I am seeing this right and the new tool is needed, we should probably update the documentation (I will be happy to do so). Thanks, Edmon
Re: New JIRA Python tool
Hi Edmon, Thanks for bringing this up. I just tried, and easy_install does not work on my laptop either. From my experience, for the purpose of requesting reviews/submitting patches, you could send pull request on github. That might be the most common way people are using now. For the documentation, I think update the correct information is good idea too. On Sat, Aug 22, 2015 at 10:50 AM, Edmon Begoli ebeg...@gmail.com wrote: What is the suitable replacement for the JIRA Python tool (jira-python) still specified on the contribution web site? https://drill.apache.org/docs/drill-patch-review-tool/ For me, easy_install is not finding jira-python library. It looks like this is the right tool: http://pythonhosted.org/jira/ Which is installed as just jira with pip, but it looks like the setup for patch submission might be slightly different. If I am seeing this right and the new tool is needed, we should probably update the documentation (I will be happy to do so). Thanks, Edmon
Re: New JIRA Python tool
Sounds good. Two related questions: 1. Are there any special procedures regarding the pull request, referencing issue in a commit messages, etc.? 2. Once I figure out the new JIRA Python tool use, how do I submit the updates for the Drill contribution and patching documentation? Is web documentation also maintained under the repo? Thank you, Edmon On Saturday, August 22, 2015, Hsuan-Yi Chu hsua...@usc.edu wrote: Hi Edmon, Thanks for bringing this up. I just tried, and easy_install does not work on my laptop either. From my experience, for the purpose of requesting reviews/submitting patches, you could send pull request on github. That might be the most common way people are using now. For the documentation, I think update the correct information is good idea too. On Sat, Aug 22, 2015 at 10:50 AM, Edmon Begoli ebeg...@gmail.com javascript:; wrote: What is the suitable replacement for the JIRA Python tool (jira-python) still specified on the contribution web site? https://drill.apache.org/docs/drill-patch-review-tool/ For me, easy_install is not finding jira-python library. It looks like this is the right tool: http://pythonhosted.org/jira/ Which is installed as just jira with pip, but it looks like the setup for patch submission might be slightly different. If I am seeing this right and the new tool is needed, we should probably update the documentation (I will be happy to do so). Thanks, Edmon