Marcus Truscello created ZEPPELIN-5758:
------------------------------------------
Summary: BigQuery hits socket timeout before reaching "wait_time"
setting
Key: ZEPPELIN-5758
URL: https://issues.apache.org/jira/browse/ZEPPELIN-5758
Project: Zeppelin
Issue Type: Bug
Components: interpreter-setting, Interpreters, zeppelin-interpreter
Affects Versions: 0.10.1
Reporter: Marcus Truscello
Attachments: bigquery-timeout.patch, stacktrace.log
The {{zeppelin.bigquery.wait_time}} BigQuery interpreter parameter is only
useful up to a value of 30 seconds. Anything beyond that exceeds the underlying
HTTP client's default read timeout and will result in a
{{java.net.SocketTimeoutException: Read timed out}} exception being thrown. (A
full stack trace is attached.)
Google's Java API guide suggests overriding the {{HttpRequestInitializer}} to
set the desired connect and read timeouts:
[https://developers.google.com/api-client-library/java/google-api-java-client/errors#timeouts]
This exact approach isn't feasible because the BigQuery interpreter's
{{createAuthorizedClient}} method is static. Instead, we can modify the
solution to use an approach similar to this StackOverflow answer which uses the
builder's {{{}setHttpRequestInitializer{}}}:
[https://stackoverflow.com/a/32894630]
It should be noted that setting the read timeout too large likely won't provide
any value. Regardless of the {{timeoutMs}} value, BigQuery will always return
a response within ~200 seconds regardless if the job has actually completed or
not:
[https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults#query-parameters]
Given that the BigQuery interpreter doesn't handle jobComplete being false,
there's no reason to set the read timeout much larger than 200 seconds.
I've attached a diff of the changes I applied to fix this issue. It should be
noted that I am not a Java developer, so I apologize if the solution is a bit
crude. :D
--
This message was sent by Atlassian Jira
(v8.20.10#820010)