[jira] [Commented] (PHOENIX-1197) Measure the performance impact of enabling tracing

2016-06-21 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341551#comment-15341551
 ] 

Pranavan commented on PHOENIX-1197:
---

Tracing is actually a specialized form of logging. Logs are primarily consumed 
by system administrators whereas traces are primarily used by developers. The 
main intention is to assist developers. 

I found following enhancements to Pheonix would be useful.

Enhancements needed to be added in Apache Phoenix
1.   Trace table is growing in a rapid rate, it should be stopped and only 
relevant trace data must be stored.
2.   A permission model needed for Apache Phoenix Tracing
3.   It should be agile not like logs
4.   Tracing is done at a lower level, hence the size of trace data will be 
higher. Currently Trace ON and OFF commands are supported. Anyway, we needs to 
add more granular level of control because it can seriously affect the 
performance.


> Measure the performance impact of enabling tracing
> --
>
> Key: PHOENIX-1197
> URL: https://issues.apache.org/jira/browse/PHOENIX-1197
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>  Labels: tracing
>
> In Phoenix 4.1, there's a new tracing capability. We should measure the 
> impact of enabling this on a live cluster before turning it on in production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected

2016-06-19 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338911#comment-15338911
 ] 

Pranavan commented on PHOENIX-2166:
---

Hi James

I did a code formatting for the entire files that I have changed. That is the 
reason that you are seeing bunch of changes that are not related to me. I will 
discuss with mentor and upload a new patch.

> Prevent writing to tracing table when tracing data collected
> 
>
> Key: PHOENIX-2166
> URL: https://issues.apache.org/jira/browse/PHOENIX-2166
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
> Attachments: PHOENIX-2166.patch
>
>
> When tracing is turned ON, trace table grows at fast pace and is filled with 
> the following traces which should not be present:
> {code}
> Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ...
> Writing mutation batch for table: SYSTEM.TRACING_STATS ...
> and so on
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected

2016-06-19 Thread Pranavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranavan updated PHOENIX-2166:
--
Attachment: PHOENIX-2166.patch

> Prevent writing to tracing table when tracing data collected
> 
>
> Key: PHOENIX-2166
> URL: https://issues.apache.org/jira/browse/PHOENIX-2166
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
> Attachments: PHOENIX-2166.patch
>
>
> When tracing is turned ON, trace table grows at fast pace and is filled with 
> the following traces which should not be present:
> {code}
> Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ...
> Writing mutation batch for table: SYSTEM.TRACING_STATS ...
> and so on
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected

2016-06-19 Thread Pranavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranavan updated PHOENIX-2166:
--
Attachment: (was: PHOENIX-2166.patch)

> Prevent writing to tracing table when tracing data collected
> 
>
> Key: PHOENIX-2166
> URL: https://issues.apache.org/jira/browse/PHOENIX-2166
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
> Attachments: PHOENIX-2166.patch
>
>
> When tracing is turned ON, trace table grows at fast pace and is filled with 
> the following traces which should not be present:
> {code}
> Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ...
> Writing mutation batch for table: SYSTEM.TRACING_STATS ...
> and so on
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected

2016-06-19 Thread Pranavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranavan updated PHOENIX-2166:
--
Attachment: PHOENIX-2166.patch

I have added the patch. Can someone review it?

> Prevent writing to tracing table when tracing data collected
> 
>
> Key: PHOENIX-2166
> URL: https://issues.apache.org/jira/browse/PHOENIX-2166
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
> Attachments: PHOENIX-2166.patch
>
>
> When tracing is turned ON, trace table grows at fast pace and is filled with 
> the following traces which should not be present:
> {code}
> Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ...
> Writing mutation batch for table: SYSTEM.TRACING_STATS ...
> and so on
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected

2016-06-19 Thread Pranavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranavan updated PHOENIX-2166:
--
Comment: was deleted

(was: Hi Mujtaba Chohan

Shall we go for a lock based solution? Then at a time only one can proceed. 
(The original requirement needs no concurrency for this particular case))

> Prevent writing to tracing table when tracing data collected
> 
>
> Key: PHOENIX-2166
> URL: https://issues.apache.org/jira/browse/PHOENIX-2166
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
>
> When tracing is turned ON, trace table grows at fast pace and is filled with 
> the following traces which should not be present:
> {code}
> Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ...
> Writing mutation batch for table: SYSTEM.TRACING_STATS ...
> and so on
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time

2016-06-16 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334009#comment-15334009
 ] 

Pranavan commented on PHOENIX-2178:
---

I made a PR to HR - https://github.com/apache/incubator-htrace/pull/10/files. 
We can merge the PR - https://github.com/apache/phoenix/pull/173 after it gets 
merged.

> Tracing - total time listed for a certain trace does not correlate with query 
> wall clock time
> -
>
> Key: PHOENIX-2178
> URL: https://issues.apache.org/jira/browse/PHOENIX-2178
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
>
> Wall clock for a count * takes over a large table takes 3+ms however 
> total sum(end_time - start_time) is less than 250ms for trace_id generated 
> for this count * query.
> {code}
> Output of trace table:
> select sum(end_time  - start_time),count(*), description from 
> SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description;
> +--+--+--+
> |   SUM((END_TIME - START_TIME))   | COUNT(1) 
> |   DESCRIPTION|
> +--+--+--+
> | 0| 3
> | ClientService.Scan   |
> | 240  | 253879   
> | HFileReaderV2.readBlock  |
> | 1| 1
> | Scanner opened on server |
> +--+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time

2016-06-15 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332982#comment-15332982
 ] 

Pranavan commented on PHOENIX-2178:
---

Colin has opened a jira for the nano time granularity. I am working on it. The 
jira link in HTrace - https://issues.apache.org/jira/browse/HTRACE-376

> Tracing - total time listed for a certain trace does not correlate with query 
> wall clock time
> -
>
> Key: PHOENIX-2178
> URL: https://issues.apache.org/jira/browse/PHOENIX-2178
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
>
> Wall clock for a count * takes over a large table takes 3+ms however 
> total sum(end_time - start_time) is less than 250ms for trace_id generated 
> for this count * query.
> {code}
> Output of trace table:
> select sum(end_time  - start_time),count(*), description from 
> SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description;
> +--+--+--+
> |   SUM((END_TIME - START_TIME))   | COUNT(1) 
> |   DESCRIPTION|
> +--+--+--+
> | 0| 3
> | ClientService.Scan   |
> | 240  | 253879   
> | HFileReaderV2.readBlock  |
> | 1| 1
> | Scanner opened on server |
> +--+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2166) Prevent writing to tracing table when tracing data collected

2016-06-14 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331100#comment-15331100
 ] 

Pranavan commented on PHOENIX-2166:
---

Hi Mujtaba Chohan

Shall we go for a lock based solution? Then at a time only one can proceed. 
(The original requirement needs no concurrency for this particular case)

> Prevent writing to tracing table when tracing data collected
> 
>
> Key: PHOENIX-2166
> URL: https://issues.apache.org/jira/browse/PHOENIX-2166
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
>
> When tracing is turned ON, trace table grows at fast pace and is filled with 
> the following traces which should not be present:
> {code}
> Executing UPSERT INTO SYSTEM.TRACING_STATS (trace_id, ...
> Writing mutation batch for table: SYSTEM.TRACING_STATS ...
> and so on
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time

2016-06-13 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328898#comment-15328898
 ] 

Pranavan commented on PHOENIX-2178:
---

I have created a pull, but it needs some HTrace level changes. I am working on 
that. We can merge, once we got the updated version of HTrace.

PR - https://github.com/apache/phoenix/pull/173

> Tracing - total time listed for a certain trace does not correlate with query 
> wall clock time
> -
>
> Key: PHOENIX-2178
> URL: https://issues.apache.org/jira/browse/PHOENIX-2178
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
>
> Wall clock for a count * takes over a large table takes 3+ms however 
> total sum(end_time - start_time) is less than 250ms for trace_id generated 
> for this count * query.
> {code}
> Output of trace table:
> select sum(end_time  - start_time),count(*), description from 
> SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description;
> +--+--+--+
> |   SUM((END_TIME - START_TIME))   | COUNT(1) 
> |   DESCRIPTION|
> +--+--+--+
> | 0| 3
> | ClientService.Scan   |
> | 240  | 253879   
> | HFileReaderV2.readBlock  |
> | 1| 1
> | Scanner opened on server |
> +--+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2178) Tracing - total time listed for a certain trace does not correlate with query wall clock time

2016-06-13 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327540#comment-15327540
 ] 

Pranavan commented on PHOENIX-2178:
---

Hi James and Mujtaba Chohan. I found the solution for this. 

The difference is huge because the wall clock is calculating the actual time 
where as sum(endtime - starttime) is calculating the sum of the differences. 
When it comes to difference, most of the readBlock traces will take nano 
seconds. There is a small fraction of readBlock traces which end up with the 
gap of 1 millisecond. 

This the code segment which assigns start and end time. 
builder.addCounter(Interns.info(START.traceName, EMPTY_STRING), 
span.getStartTimeMillis());
builder.addCounter(Interns.info(END.traceName, EMPTY_STRING), 
span.getStopTimeMillis());


The solution is to store the nanotime. This will eventually lead to get rid of 
round of errors which we are actually facing now. It will not change the DDL of 
Tracing table, but the content will be nanotime.

> Tracing - total time listed for a certain trace does not correlate with query 
> wall clock time
> -
>
> Key: PHOENIX-2178
> URL: https://issues.apache.org/jira/browse/PHOENIX-2178
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.5.0
>Reporter: Mujtaba Chohan
>  Labels: gsoc2016, tracing
>
> Wall clock for a count * takes over a large table takes 3+ms however 
> total sum(end_time - start_time) is less than 250ms for trace_id generated 
> for this count * query.
> {code}
> Output of trace table:
> select sum(end_time  - start_time),count(*), description from 
> SYSTEM.TRACING_STATS WHERE TRACE_ID=X group by description;
> +--+--+--+
> |   SUM((END_TIME - START_TIME))   | COUNT(1) 
> |   DESCRIPTION|
> +--+--+--+
> | 0| 3
> | ClientService.Scan   |
> | 240  | 253879   
> | HFileReaderV2.readBlock  |
> | 1| 1
> | Scanner opened on server |
> +--+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2062) Support COUNT DISTINCT with multiple arguments

2016-03-20 Thread Pranavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199843#comment-15199843
 ] 

Pranavan commented on PHOENIX-2062:
---

Hi!!

I am Pranavan from University of Moratuwa, Sri Lanka. I am interswted in doing 
this project. Can someone give further input in this issue?

> Support COUNT DISTINCT with multiple arguments
> --
>
> Key: PHOENIX-2062
> URL: https://issues.apache.org/jira/browse/PHOENIX-2062
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>  Labels: gsoc2016
>
> I have a situation where I want to count the distinct combination of a couple 
> of columns.
> When I try the following:-
> select count(distinct a.col1, b.col2)
> from table tab1 a
> inner join tab2 b on b.joincol = a.joincol
> where a.col3 = ‘some condition’
> and b.col4 = ‘some other condition';
> I get the following error:-
> Error: ERROR 605 (42P00): Syntax error. Unknown function: "DISTINCT_COUNT". 
> (state=42P00,code=605)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)