Marcus Truscello created ZEPPELIN-5758: ------------------------------------------
Summary: BigQuery hits socket timeout before reaching "wait_time" setting Key: ZEPPELIN-5758 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5758 Project: Zeppelin Issue Type: Bug Components: interpreter-setting, Interpreters, zeppelin-interpreter Affects Versions: 0.10.1 Reporter: Marcus Truscello Attachments: bigquery-timeout.patch, stacktrace.log The {{zeppelin.bigquery.wait_time}} BigQuery interpreter parameter is only useful up to a value of 30 seconds. Anything beyond that exceeds the underlying HTTP client's default read timeout and will result in a {{java.net.SocketTimeoutException: Read timed out}} exception being thrown. (A full stack trace is attached.) Google's Java API guide suggests overriding the {{HttpRequestInitializer}} to set the desired connect and read timeouts: [https://developers.google.com/api-client-library/java/google-api-java-client/errors#timeouts] This exact approach isn't feasible because the BigQuery interpreter's {{createAuthorizedClient}} method is static. Instead, we can modify the solution to use an approach similar to this StackOverflow answer which uses the builder's {{{}setHttpRequestInitializer{}}}: [https://stackoverflow.com/a/32894630] It should be noted that setting the read timeout too large likely won't provide any value. Regardless of the {{timeoutMs}} value, BigQuery will always return a response within ~200 seconds regardless if the job has actually completed or not: [https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults#query-parameters] Given that the BigQuery interpreter doesn't handle jobComplete being false, there's no reason to set the read timeout much larger than 200 seconds. I've attached a diff of the changes I applied to fix this issue. It should be noted that I am not a Java developer, so I apologize if the solution is a bit crude. :D -- This message was sent by Atlassian Jira (v8.20.10#820010)