[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

2019-05-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842635#comment-16842635
 ] 

ASF GitHub Bot commented on DRILL-2362:
---

kkhatua commented on issue #1750: DRILL-2362: Profile Mgmt
URL: https://github.com/apache/drill/pull/1750#issuecomment-493610297
 
 
   @arina-ielchiieva  could you please review this PR ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

2017-08-29 Thread Rahul Raj (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146664#comment-16146664
 ] 

Rahul Raj commented on DRILL-2362:
--

We had more than 200,000 profiles stored in the default location and it made 
the profile page access extremely slow. Also while accessing this page, client 
connections were disconnected (channel closed exception) - could be related to 
a large GC causing missed heart beats.

-Rahul

> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
> Fix For: Future
>
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

2019-01-10 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739865#comment-16739865
 ] 

Kunal Khatua commented on DRILL-2362:
-

Hoping to resolve this with a combination of fix for DRILL-5270

> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.16.0
>
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

2019-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818213#comment-16818213
 ] 

ASF GitHub Bot commented on DRILL-2362:
---

kkhatua commented on pull request #1750: DRILL-2362: Profile Mgmt
URL: https://github.com/apache/drill/pull/1750
 
 
   This PR is a WIP for managing a large number of profiles. It involves the 
following features.
   
   1. Write profiles to indexed partitions (created on the fly, and default 
being organized in nested directories by year, month and date).
   2. Read chronologically from the above partitioned dirs. This improves 
performance by scanning and retrieving only from the most recent profiles
   3. Leverage Guava Cache by saving on cost of deserializing a profile 
multiple times from the disk. (Even 1 attempt at rendering a profile leads to 
atleast 2 times deserialization).
   4. Infer which partitioned dir has a profile based on queryId alone. This 
means that rather than scanning all the directories, we reverse engineer the 
query ID to figure out the approximate start time of the query to narrow down 
on the profile's location.
   5. Trace Exception [qId: 259432dc-7f8e-8fc5-af69-16a1ca817689 ] -> This is a 
sample bad profile and make the UI more robust in handling bad profiles that 
cant be deserialized
   6. Auto Index for 1st time (In batches of 1) from root dir (sync if 
Distributed). Using ZK, synchronization is maintained when multiple Drillbits 
are sharing the same profile location
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

2019-04-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818218#comment-16818218
 ] 

ASF GitHub Bot commented on DRILL-2362:
---

kkhatua commented on issue #1750: DRILL-2362: Profile Mgmt
URL: https://github.com/apache/drill/pull/1750#issuecomment-483352093
 
 
   @arina-ielchiieva I'm looking for a suggestion on how to manage existing 
profiles. 
   Currently, during startup, we automatically index any profiles in the root 
of the `profiles` dir into their respective location. As a default, I have set 
it at 1000, and the Drillbits syncrhonize if the profile dir is on a DFS.
   However, I am not sure if we should automatically index any new profile that 
has been copy-pasted into the root dir. For e.g., we might get a profile from a 
JIRA and would like to view it. Should we leave it there (and try to render it) 
or should we index it ASAP ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.17.0
>
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

2022-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608728#comment-17608728
 ] 

ASF GitHub Bot commented on DRILL-2362:
---

jnturton commented on PR #1750:
URL: https://github.com/apache/drill/pull/1750#issuecomment-1256153990

   Hi Kunal! Thank you for this contribution, I'd like to help to move it 
forward. 
   
   > However, I am not sure if we should automatically index any new profile 
that has been copy-pasted into the root dir. For e.g., we might get a profile 
from a JIRA and would like to view it. Should we leave it there (and try to 
render it) or should we index it ASAP ?
   
   I share your concern here.  I think we should consider having Drill only 
write _new_ profiles to partitioned directories. Any partitioning of historical 
profiles can be done externally by admins, in my opinion, and we can add 
examples of "housekeeping" scripts for doing that to the Drill documentation.
   
   Would you like to do any of the following?
   
   - Resume the work on this PR, I volunteer to be a reviewer.
   - Receive a PR from me to your fork here that's rebased and has some changes 
I think we want.
   - You're too busy now, so I'll pull your commits here into a new branch of 
my own and open a new PR.
   
   Thanks
   James




> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Priority: Major
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

2022-09-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608731#comment-17608731
 ] 

ASF GitHub Bot commented on DRILL-2362:
---

jnturton commented on PR #1750:
URL: https://github.com/apache/drill/pull/1750#issuecomment-1256165816

   Or, if we do want to keep this built in ability to partition existing 
profiles, perhaps we should have it launched from a button on the Profiles page 
in the web UI instead of on Drillbit startup? That would remove the 
complication of which Drillbit does the work and the worries of slowing down 
startup or partitioning profiles that nobody wanted partitioned.




> Drill should manage Query Profiling archiving
> -
>
> Key: DRILL-2362
> URL: https://issues.apache.org/jira/browse/DRILL-2362
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 0.7.0
>Reporter: Chris Westin
>Priority: Major
>
> We collect query profile information for analysis purposes, but we keep it 
> forever. At this time, for a few queries, it isn't a problem. But as users 
> start putting Drill into production, automated use via other applications 
> will make this grow quickly. We need to come up with a retention policy 
> mechanism, with suitable settings administrators can use, and implement it so 
> that this data can be cleaned up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)