[jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872503#comment-15872503
 ] 

Thejas M Nair commented on HIVE-15908:
--

Are you testing with the master branch ?
HiveStatemet.DEFAULT_FETCH_SIZE has been 1000 for a while. But I am not sure 
why that would have an impact.
HIVE-14618 has changes to have shorter timeouts for the getOperationStatus long 
polling calls, which has similar impact like what Hue is doing. That could be 
what you are hitting.
But it looks like it didn't change the beeline sleep timeouts for log fetches. 
We could have a step function for that as well.



> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-17 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872048#comment-15872048
 ] 

Harsh J commented on HIVE-15908:


After a bit more testing with some slow logging queries, aside of just this 
newline flush on the server side, increasing the fetch size in HiveStatement to 
a very large value (1000 rows) also helps, as does decreasing the Beeline 
Command class's UI-jarring 1 second pause between fetches to something like 
100-200 ms.

I'm unsure if such changes are acceptable though, as they'd increase the 
running load on the HS2 given overall beeline usage. FWIW, Hue feels more 
pleasant to use, and it polls the query logs with a fetch size of 1000 rows 
with a dynamic refresh sleep time that begins with 100 ms and scales up to 2s 
over time, in increments of 100ms (this works better cause there's more logging 
at the beginning of the query than around the end).

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865549#comment-15865549
 ] 

Hive QA commented on HIVE-15908:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852513/HIVE-15908.000.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10238 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3534/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3534/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3534/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852513 - PreCommit-HIVE-Build

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on

2017-02-13 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865281#comment-15865281
 ] 

Harsh J commented on HIVE-15908:


(H/T to Lingesh Radhakrishnan for the discovery)

> OperationLog's LogFile writer should have autoFlush turned on
> -
>
> Key: HIVE-15908
> URL: https://issues.apache.org/jira/browse/HIVE-15908
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained 
> OperationLog file. The reader used inside class OperationLog$LogFile class 
> reads line-by-line on its input stream, for any lines available from the OS's 
> file input perspective.
> The writer inside the same class uses PrintStream to write to the file in 
> parallel. However, the PrintStream constructor used sets PrintStream's 
> {{autoFlush}} feature in an OFF state. This causes the BufferedWriter used by 
> PrintStream to accumulate 8k worth of bytes in memory as the buffer before 
> flushing the writes to disk, causing a slowness in the logs streamed back to 
> the client. Every line must be ideally flushed entirely as-its-written, for a 
> smoother experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as 
> below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and 
> make for a better reader-log-results-streaming experience: 
> https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)