[jira] [Resolved] (IMPALA-7044) int32 overflow in HdfsTableSink::CreateNewTmpFile()

2018-06-26 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-7044.
-
   Resolution: Fixed
 Assignee: Lars Volker
Fix Version/s: Impala 3.0
   Impala 2.13.0

> int32 overflow in HdfsTableSink::CreateNewTmpFile()
> ---
>
> Key: IMPALA-7044
> URL: https://issues.apache.org/jira/browse/IMPALA-7044
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0, Impala 2.13.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Critical
>  Labels: parquet
> Fix For: Impala 2.13.0, Impala 3.0
>
> Attachments: ct.sql
>
>
> When writing Parquet files we compute a minimum block size based on the 
> number of columns in the target table in 
> [hdfs-parquet-table-writer.cc:916|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-table-writer.cc?utf8=%E2%9C%93#L916]:
> {noformat}
> 3 * DEFAULT_DATA_PAGE_SIZE * columns_.size();
> {noformat}
> For tables with a large number of columns (> ~10k), this value will get 
> larger than 2GB. When we pass it to {{hdfsOpenFile()}} in 
> {{HdfsTableSink::CreateNewTmpFile()}} it gets cast to a signed int32 and can 
> overflow.
> This leads to error messages like the following:
> {noformat}
> I0516 16:13:52.755090 24257 status.cc:125] Failed to open HDFS file for 
> writing: 
> hdfs://localhost:20500/test-warehouse/lv.db/a/_impala_insert_staging/3c417cb973b710ab_803e8980/.3c417cb973b710ab-80
> 3e8980_411033576_dir/3c417cb973b710ab-803e8980_271567064_data.0.parq
> Error(255): Unknown error 255
> Root cause: RemoteException: Specified block size is less than configured 
> minimum value (dfs.namenode.fs-limits.min-block-size): -1935671296 < 1024
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2417)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> @  0x187b8b3  impala::Status::Status()
> @  0x1fade89  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1faeee7  impala::HdfsTableSink::InitOutputPartition()
> @  0x1fb1389  impala::HdfsTableSink::GetOutputPartition()
> @  0x1faf34a  impala::HdfsTableSink::Send()
> @  0x1c91bcd  impala::FragmentInstanceState::ExecInternal()
> @  0x1c8efa5  impala::FragmentInstanceState::Exec()
> @  0x1c9e53f  impala::QueryState::ExecFInstance()
> @  0x1c9cdb2  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x1c9f25d  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd6cd4  boost::function0<>::operator()()
> @  0x1ec18f9  impala::Thread::SuperviseThread()
> @  0x1ec9a95  boost::_bi::list5<>::operator()<>()
> @  0x1ec99b9  boost::_bi::bind_t<>::operator()()
> @  0x1ec997c  boost::detail::thread_data<>::run()
> @  0x31a527a  thread_proxy
> @ 0x7f30246a8184  start_thread
> @ 0x7f30243d503d  clone
> {noformat}
> The signature of {{hdfsOpenFile()}} is as follows:
> {noformat}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, int bufferSize, 
> short replication, tSize blocksize);
> {noformat}
> {{tSize}} is typedef'd to {{int32_t}}.
> The comment of {{hdfsOpenFile()}} is explicit about this:
> {noformat}
> @param blocksize Size of block - pass 0 if you want to use the
> default configured values.  Note that if you want a block size 

[jira] [Resolved] (IMPALA-7044) int32 overflow in HdfsTableSink::CreateNewTmpFile()

2018-06-26 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-7044.
-
   Resolution: Fixed
 Assignee: Lars Volker
Fix Version/s: Impala 3.0
   Impala 2.13.0

> int32 overflow in HdfsTableSink::CreateNewTmpFile()
> ---
>
> Key: IMPALA-7044
> URL: https://issues.apache.org/jira/browse/IMPALA-7044
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0, Impala 2.13.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Critical
>  Labels: parquet
> Fix For: Impala 2.13.0, Impala 3.0
>
> Attachments: ct.sql
>
>
> When writing Parquet files we compute a minimum block size based on the 
> number of columns in the target table in 
> [hdfs-parquet-table-writer.cc:916|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-table-writer.cc?utf8=%E2%9C%93#L916]:
> {noformat}
> 3 * DEFAULT_DATA_PAGE_SIZE * columns_.size();
> {noformat}
> For tables with a large number of columns (> ~10k), this value will get 
> larger than 2GB. When we pass it to {{hdfsOpenFile()}} in 
> {{HdfsTableSink::CreateNewTmpFile()}} it gets cast to a signed int32 and can 
> overflow.
> This leads to error messages like the following:
> {noformat}
> I0516 16:13:52.755090 24257 status.cc:125] Failed to open HDFS file for 
> writing: 
> hdfs://localhost:20500/test-warehouse/lv.db/a/_impala_insert_staging/3c417cb973b710ab_803e8980/.3c417cb973b710ab-80
> 3e8980_411033576_dir/3c417cb973b710ab-803e8980_271567064_data.0.parq
> Error(255): Unknown error 255
> Root cause: RemoteException: Specified block size is less than configured 
> minimum value (dfs.namenode.fs-limits.min-block-size): -1935671296 < 1024
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2417)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> @  0x187b8b3  impala::Status::Status()
> @  0x1fade89  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1faeee7  impala::HdfsTableSink::InitOutputPartition()
> @  0x1fb1389  impala::HdfsTableSink::GetOutputPartition()
> @  0x1faf34a  impala::HdfsTableSink::Send()
> @  0x1c91bcd  impala::FragmentInstanceState::ExecInternal()
> @  0x1c8efa5  impala::FragmentInstanceState::Exec()
> @  0x1c9e53f  impala::QueryState::ExecFInstance()
> @  0x1c9cdb2  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x1c9f25d  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd6cd4  boost::function0<>::operator()()
> @  0x1ec18f9  impala::Thread::SuperviseThread()
> @  0x1ec9a95  boost::_bi::list5<>::operator()<>()
> @  0x1ec99b9  boost::_bi::bind_t<>::operator()()
> @  0x1ec997c  boost::detail::thread_data<>::run()
> @  0x31a527a  thread_proxy
> @ 0x7f30246a8184  start_thread
> @ 0x7f30243d503d  clone
> {noformat}
> The signature of {{hdfsOpenFile()}} is as follows:
> {noformat}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, int bufferSize, 
> short replication, tSize blocksize);
> {noformat}
> {{tSize}} is typedef'd to {{int32_t}}.
> The comment of {{hdfsOpenFile()}} is explicit about this:
> {noformat}
> @param blocksize Size of block - pass 0 if you want to use the
> default configured values.  Note that if you want a block size