[jira] [Commented] (PHOENIX-1197) Measure the performance impact of enabling tracing
[ https://issues.apache.org/jira/browse/PHOENIX-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341551#comment-15341551 ] Pranavan commented on PHOENIX-1197: --- Tracing is actually a specialized form of logging. Logs are primarily consumed by system administrators whereas traces are primarily used by developers. The main intention is to assist developers. I found following enhancements to Pheonix would be useful. Enhancements needed to be added in Apache Phoenix 1. Trace table is growing in a rapid rate, it should be stopped and only relevant trace data must be stored. 2. A permission model needed for Apache Phoenix Tracing 3. It should be agile not like logs 4. Tracing is done at a lower level, hence the size of trace data will be higher. Currently Trace ON and OFF commands are supported. Anyway, we needs to add more granular level of control because it can seriously affect the performance. > Measure the performance impact of enabling tracing > -- > > Key: PHOENIX-1197 > URL: https://issues.apache.org/jira/browse/PHOENIX-1197 > Project: Phoenix > Issue Type: Sub-task >Reporter: James Taylor > Labels: tracing > > In Phoenix 4.1, there's a new tracing capability. We should measure the > impact of enabling this on a live cluster before turning it on in production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected
[ https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338911#comment-15338911 ] Pranavan commented on PHOENIX-2166: --- Hi James I did a code formatting for the entire files that I have changed. That is the reason that you are seeing bunch of changes that are not related to me. I will discuss with mentor and upload a new patch. > Prevent writing to tracing table when tracing data collected > > > Key: PHOENIX-2166 > URL: https://issues.apache.org/jira/browse/PHOENIX-2166 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > Attachments: PHOENIX-2166.patch > > > When tracing is turned ON, trace table grows at fast pace and is filled with > the following traces which should not be present: > {code} > Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ... > Writing mutation batch for table: SYSTEM.TRACING_STATS ... > and so on > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected
[ https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranavan updated PHOENIX-2166: -- Attachment: PHOENIX-2166.patch > Prevent writing to tracing table when tracing data collected > > > Key: PHOENIX-2166 > URL: https://issues.apache.org/jira/browse/PHOENIX-2166 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > Attachments: PHOENIX-2166.patch > > > When tracing is turned ON, trace table grows at fast pace and is filled with > the following traces which should not be present: > {code} > Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ... > Writing mutation batch for table: SYSTEM.TRACING_STATS ... > and so on > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected
[ https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranavan updated PHOENIX-2166: -- Attachment: (was: PHOENIX-2166.patch) > Prevent writing to tracing table when tracing data collected > > > Key: PHOENIX-2166 > URL: https://issues.apache.org/jira/browse/PHOENIX-2166 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > Attachments: PHOENIX-2166.patch > > > When tracing is turned ON, trace table grows at fast pace and is filled with > the following traces which should not be present: > {code} > Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ... > Writing mutation batch for table: SYSTEM.TRACING_STATS ... > and so on > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected
[ https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranavan updated PHOENIX-2166: -- Attachment: PHOENIX-2166.patch I have added the patch. Can someone review it? > Prevent writing to tracing table when tracing data collected > > > Key: PHOENIX-2166 > URL: https://issues.apache.org/jira/browse/PHOENIX-2166 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > Attachments: PHOENIX-2166.patch > > > When tracing is turned ON, trace table grows at fast pace and is filled with > the following traces which should not be present: > {code} > Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ... > Writing mutation batch for table: SYSTEM.TRACING_STATS ... > and so on > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected
[ https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranavan updated PHOENIX-2166: -- Comment: was deleted (was: Hi Mujtaba Chohan Shall we go for a lock based solution? Then at a time only one can proceed. (The original requirement needs no concurrency for this particular case)) > Prevent writing to tracing table when tracing data collected > > > Key: PHOENIX-2166 > URL: https://issues.apache.org/jira/browse/PHOENIX-2166 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > > When tracing is turned ON, trace table grows at fast pace and is filled with > the following traces which should not be present: > {code} > Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ... > Writing mutation batch for table: SYSTEM.TRACING_STATS ... > and so on > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time
[ https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334009#comment-15334009 ] Pranavan commented on PHOENIX-2178: --- I made a PR to HR - https://github.com/apache/incubator-htrace/pull/10/files. We can merge the PR - https://github.com/apache/phoenix/pull/173 after it gets merged. > Tracing - total time listed for a certain trace does not correlate with query > wall clock time > - > > Key: PHOENIX-2178 > URL: https://issues.apache.org/jira/browse/PHOENIX-2178 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > > Wall clock for a count * takes over a large table takes 3+ms however > total sum(end_time - start_time) is less than 250ms for trace_id generated > for this count * query. > {code} > Output of trace table: > select sum(end_time - start_time),count(*), description from > SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description; > +--+--+--+ > | SUM((END_TIME - START_TIME)) | COUNT(1) > | DESCRIPTION| > +--+--+--+ > | 0| 3 > | ClientService.Scan | > | 240 | 253879 > | HFileReaderV2.readBlock | > | 1| 1 > | Scanner opened on server | > +--+--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time
[ https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332982#comment-15332982 ] Pranavan commented on PHOENIX-2178: --- Colin has opened a jira for the nano time granularity. I am working on it. The jira link in HTrace - https://issues.apache.org/jira/browse/HTRACE-376 > Tracing - total time listed for a certain trace does not correlate with query > wall clock time > - > > Key: PHOENIX-2178 > URL: https://issues.apache.org/jira/browse/PHOENIX-2178 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > > Wall clock for a count * takes over a large table takes 3+ms however > total sum(end_time - start_time) is less than 250ms for trace_id generated > for this count * query. > {code} > Output of trace table: > select sum(end_time - start_time),count(*), description from > SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description; > +--+--+--+ > | SUM((END_TIME - START_TIME)) | COUNT(1) > | DESCRIPTION| > +--+--+--+ > | 0| 3 > | ClientService.Scan | > | 240 | 253879 > | HFileReaderV2.readBlock | > | 1| 1 > | Scanner opened on server | > +--+--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected
[ https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331100#comment-15331100 ] Pranavan commented on PHOENIX-2166: --- Hi Mujtaba Chohan Shall we go for a lock based solution? Then at a time only one can proceed. (The original requirement needs no concurrency for this particular case) > Prevent writing to tracing table when tracing data collected > > > Key: PHOENIX-2166 > URL: https://issues.apache.org/jira/browse/PHOENIX-2166 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > > When tracing is turned ON, trace table grows at fast pace and is filled with > the following traces which should not be present: > {code} > Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ... > Writing mutation batch for table: SYSTEM.TRACING_STATS ... > and so on > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time
[ https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328898#comment-15328898 ] Pranavan commented on PHOENIX-2178: --- I have created a pull, but it needs some HTrace level changes. I am working on that. We can merge, once we got the updated version of HTrace. PR - https://github.com/apache/phoenix/pull/173 > Tracing - total time listed for a certain trace does not correlate with query > wall clock time > - > > Key: PHOENIX-2178 > URL: https://issues.apache.org/jira/browse/PHOENIX-2178 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > > Wall clock for a count * takes over a large table takes 3+ms however > total sum(end_time - start_time) is less than 250ms for trace_id generated > for this count * query. > {code} > Output of trace table: > select sum(end_time - start_time),count(*), description from > SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description; > +--+--+--+ > | SUM((END_TIME - START_TIME)) | COUNT(1) > | DESCRIPTION| > +--+--+--+ > | 0| 3 > | ClientService.Scan | > | 240 | 253879 > | HFileReaderV2.readBlock | > | 1| 1 > | Scanner opened on server | > +--+--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time
[ https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327540#comment-15327540 ] Pranavan commented on PHOENIX-2178: --- Hi James and Mujtaba Chohan. I found the solution for this. The difference is huge because the wall clock is calculating the actual time where as sum(endtime - starttime) is calculating the sum of the differences. When it comes to difference, most of the readBlock traces will take nano seconds. There is a small fraction of readBlock traces which end up with the gap of 1 millisecond. This the code segment which assigns start and end time. builder.addCounter(Interns.info(START.traceName, EMPTY_STRING), span.getStartTimeMillis()); builder.addCounter(Interns.info(END.traceName, EMPTY_STRING), span.getStopTimeMillis()); The solution is to store the nanotime. This will eventually lead to get rid of round of errors which we are actually facing now. It will not change the DDL of Tracing table, but the content will be nanotime. > Tracing - total time listed for a certain trace does not correlate with query > wall clock time > - > > Key: PHOENIX-2178 > URL: https://issues.apache.org/jira/browse/PHOENIX-2178 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.5.0 >Reporter: Mujtaba Chohan > Labels: gsoc2016, tracing > > Wall clock for a count * takes over a large table takes 3+ms however > total sum(end_time - start_time) is less than 250ms for trace_id generated > for this count * query. > {code} > Output of trace table: > select sum(end_time - start_time),count(*), description from > SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description; > +--+--+--+ > | SUM((END_TIME - START_TIME)) | COUNT(1) > | DESCRIPTION| > +--+--+--+ > | 0| 3 > | ClientService.Scan | > | 240 | 253879 > | HFileReaderV2.readBlock | > | 1| 1 > | Scanner opened on server | > +--+--+--+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2062) Support COUNT DISTINCT with multiple arguments
[ https://issues.apache.org/jira/browse/PHOENIX-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199843#comment-15199843 ] Pranavan commented on PHOENIX-2062: --- Hi!! I am Pranavan from University of Moratuwa, Sri Lanka. I am interswted in doing this project. Can someone give further input in this issue? > Support COUNT DISTINCT with multiple arguments > -- > > Key: PHOENIX-2062 > URL: https://issues.apache.org/jira/browse/PHOENIX-2062 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Labels: gsoc2016 > > I have a situation where I want to count the distinct combination of a couple > of columns. > When I try the following:- > select count(distinct a.col1, b.col2) > from table tab1 a > inner join tab2 b on b.joincol = a.joincol > where a.col3 = ‘some condition’ > and b.col4 = ‘some other condition'; > I get the following error:- > Error: ERROR 605 (42P00): Syntax error. Unknown function: "DISTINCT_COUNT". > (state=42P00,code=605) -- This message was sent by Atlassian JIRA (v6.3.4#6332)