[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving
[ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842635#comment-16842635 ] ASF GitHub Bot commented on DRILL-2362: --- kkhatua commented on issue #1750: DRILL-2362: Profile Mgmt URL: https://github.com/apache/drill/pull/1750#issuecomment-493610297 @arina-ielchiieva could you please review this PR ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill should manage Query Profiling archiving > - > > Key: DRILL-2362 > URL: https://issues.apache.org/jira/browse/DRILL-2362 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 0.7.0 >Reporter: Chris Westin >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > We collect query profile information for analysis purposes, but we keep it > forever. At this time, for a few queries, it isn't a problem. But as users > start putting Drill into production, automated use via other applications > will make this grow quickly. We need to come up with a retention policy > mechanism, with suitable settings administrators can use, and implement it so > that this data can be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving
[ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146664#comment-16146664 ] Rahul Raj commented on DRILL-2362: -- We had more than 200,000 profiles stored in the default location and it made the profile page access extremely slow. Also while accessing this page, client connections were disconnected (channel closed exception) - could be related to a large GC causing missed heart beats. -Rahul > Drill should manage Query Profiling archiving > - > > Key: DRILL-2362 > URL: https://issues.apache.org/jira/browse/DRILL-2362 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 0.7.0 >Reporter: Chris Westin > Fix For: Future > > > We collect query profile information for analysis purposes, but we keep it > forever. At this time, for a few queries, it isn't a problem. But as users > start putting Drill into production, automated use via other applications > will make this grow quickly. We need to come up with a retention policy > mechanism, with suitable settings administrators can use, and implement it so > that this data can be cleaned up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving
[ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739865#comment-16739865 ] Kunal Khatua commented on DRILL-2362: - Hoping to resolve this with a combination of fix for DRILL-5270 > Drill should manage Query Profiling archiving > - > > Key: DRILL-2362 > URL: https://issues.apache.org/jira/browse/DRILL-2362 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 0.7.0 >Reporter: Chris Westin >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.16.0 > > > We collect query profile information for analysis purposes, but we keep it > forever. At this time, for a few queries, it isn't a problem. But as users > start putting Drill into production, automated use via other applications > will make this grow quickly. We need to come up with a retention policy > mechanism, with suitable settings administrators can use, and implement it so > that this data can be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving
[ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818213#comment-16818213 ] ASF GitHub Bot commented on DRILL-2362: --- kkhatua commented on pull request #1750: DRILL-2362: Profile Mgmt URL: https://github.com/apache/drill/pull/1750 This PR is a WIP for managing a large number of profiles. It involves the following features. 1. Write profiles to indexed partitions (created on the fly, and default being organized in nested directories by year, month and date). 2. Read chronologically from the above partitioned dirs. This improves performance by scanning and retrieving only from the most recent profiles 3. Leverage Guava Cache by saving on cost of deserializing a profile multiple times from the disk. (Even 1 attempt at rendering a profile leads to atleast 2 times deserialization). 4. Infer which partitioned dir has a profile based on queryId alone. This means that rather than scanning all the directories, we reverse engineer the query ID to figure out the approximate start time of the query to narrow down on the profile's location. 5. Trace Exception [qId: 259432dc-7f8e-8fc5-af69-16a1ca817689 ] -> This is a sample bad profile and make the UI more robust in handling bad profiles that cant be deserialized 6. Auto Index for 1st time (In batches of 1) from root dir (sync if Distributed). Using ZK, synchronization is maintained when multiple Drillbits are sharing the same profile location This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill should manage Query Profiling archiving > - > > Key: DRILL-2362 > URL: https://issues.apache.org/jira/browse/DRILL-2362 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 0.7.0 >Reporter: Chris Westin >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > We collect query profile information for analysis purposes, but we keep it > forever. At this time, for a few queries, it isn't a problem. But as users > start putting Drill into production, automated use via other applications > will make this grow quickly. We need to come up with a retention policy > mechanism, with suitable settings administrators can use, and implement it so > that this data can be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving
[ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818218#comment-16818218 ] ASF GitHub Bot commented on DRILL-2362: --- kkhatua commented on issue #1750: DRILL-2362: Profile Mgmt URL: https://github.com/apache/drill/pull/1750#issuecomment-483352093 @arina-ielchiieva I'm looking for a suggestion on how to manage existing profiles. Currently, during startup, we automatically index any profiles in the root of the `profiles` dir into their respective location. As a default, I have set it at 1000, and the Drillbits syncrhonize if the profile dir is on a DFS. However, I am not sure if we should automatically index any new profile that has been copy-pasted into the root dir. For e.g., we might get a profile from a JIRA and would like to view it. Should we leave it there (and try to render it) or should we index it ASAP ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill should manage Query Profiling archiving > - > > Key: DRILL-2362 > URL: https://issues.apache.org/jira/browse/DRILL-2362 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 0.7.0 >Reporter: Chris Westin >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.17.0 > > > We collect query profile information for analysis purposes, but we keep it > forever. At this time, for a few queries, it isn't a problem. But as users > start putting Drill into production, automated use via other applications > will make this grow quickly. We need to come up with a retention policy > mechanism, with suitable settings administrators can use, and implement it so > that this data can be cleaned up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving
[ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608728#comment-17608728 ] ASF GitHub Bot commented on DRILL-2362: --- jnturton commented on PR #1750: URL: https://github.com/apache/drill/pull/1750#issuecomment-1256153990 Hi Kunal! Thank you for this contribution, I'd like to help to move it forward. > However, I am not sure if we should automatically index any new profile that has been copy-pasted into the root dir. For e.g., we might get a profile from a JIRA and would like to view it. Should we leave it there (and try to render it) or should we index it ASAP ? I share your concern here. I think we should consider having Drill only write _new_ profiles to partitioned directories. Any partitioning of historical profiles can be done externally by admins, in my opinion, and we can add examples of "housekeeping" scripts for doing that to the Drill documentation. Would you like to do any of the following? - Resume the work on this PR, I volunteer to be a reviewer. - Receive a PR from me to your fork here that's rebased and has some changes I think we want. - You're too busy now, so I'll pull your commits here into a new branch of my own and open a new PR. Thanks James > Drill should manage Query Profiling archiving > - > > Key: DRILL-2362 > URL: https://issues.apache.org/jira/browse/DRILL-2362 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 0.7.0 >Reporter: Chris Westin >Priority: Major > > We collect query profile information for analysis purposes, but we keep it > forever. At this time, for a few queries, it isn't a problem. But as users > start putting Drill into production, automated use via other applications > will make this grow quickly. We need to come up with a retention policy > mechanism, with suitable settings administrators can use, and implement it so > that this data can be cleaned up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving
[ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608731#comment-17608731 ] ASF GitHub Bot commented on DRILL-2362: --- jnturton commented on PR #1750: URL: https://github.com/apache/drill/pull/1750#issuecomment-1256165816 Or, if we do want to keep this built in ability to partition existing profiles, perhaps we should have it launched from a button on the Profiles page in the web UI instead of on Drillbit startup? That would remove the complication of which Drillbit does the work and the worries of slowing down startup or partitioning profiles that nobody wanted partitioned. > Drill should manage Query Profiling archiving > - > > Key: DRILL-2362 > URL: https://issues.apache.org/jira/browse/DRILL-2362 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 0.7.0 >Reporter: Chris Westin >Priority: Major > > We collect query profile information for analysis purposes, but we keep it > forever. At this time, for a few queries, it isn't a problem. But as users > start putting Drill into production, automated use via other applications > will make this grow quickly. We need to come up with a retention policy > mechanism, with suitable settings administrators can use, and implement it so > that this data can be cleaned up. -- This message was sent by Atlassian Jira (v8.20.10#820010)