[ https://issues.apache.org/jira/browse/IMPALA-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231313#comment-17231313 ]
Zoltán Borók-Nagy commented on IMPALA-10310: -------------------------------------------- [~guojingfeng] thanks again for your fix. Can we close this issue now? > Couldn't skip rows in parquet file > ---------------------------------- > > Key: IMPALA-10310 > URL: https://issues.apache.org/jira/browse/IMPALA-10310 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.4.0 > Reporter: guojingfeng > Assignee: guojingfeng > Priority: Critical > > When hdfs-parquet-scanner thread assigned ScanRanges that contains multi > RowGroups, > error process skip rows logic with PageIndex. > Below is the error log: > {code:java} > I1028 17:59:16.694046 1414911 status.cc:68] > 1447f227b73a4d78:92d9a82600000fd1] Could not read definition level, even > though metadata states there are 0 values remaining in data page. > file=hdfs://path/to/file > @ 0xbf4286 > @ 0x17bc0eb > @ 0x17737f7 > @ 0x1773a0e > @ 0x1773d8a > @ 0x1774028 > @ 0x17b9517 > @ 0x174a22b > @ 0x17526fe > @ 0x140a78a > @ 0x1525908 > @ 0x1526a03 > @ 0x10e6169 > @ 0x10e84c9 > @ 0x10c7a86 > @ 0x13750ba > @ 0x1375f89 > @ 0x1b49679 > @ 0x7ffb2eee1e24 > @ 0x7ffb2bad935c > I1028 17:59:16.694074 1414911 status.cc:126] > 1447f227b73a4d78:92d9a82600000fd1] Couldn't skip rows in file > hdfs://path/to/file > @ 0xbf5259 > @ 0x1773a8a > @ 0x1773d8a > @ 0x1774028 > @ 0x17b9517 > @ 0x174a22b > @ 0x17526fe > @ 0x140a78a > @ 0x1525908 > @ 0x1526a03 > @ 0x10e6169 > @ 0x10e84c9 > @ 0x10c7a86 > @ 0x13750ba > @ 0x1375f89 > @ 0x1b49679 > @ 0x7ffb2eee1e24 > @ 0x7ffb2bad935c > I1028 17:59:16.694101 1414911 runtime-state.cc:207] > 1447f227b73a4d78:92d9a82600000fd1] Error from query > 1447f227b73a4d78:92d9a82600000000: Couldn't skip rows in file > hdfs://path/to/file. > {code} > On debug build the error log is that: > {code:java} > F1030 14:06:38.700459 3148733 parquet-column-readers.cc:1258] > 994968c01171b0bc:eab92b3f0000000a] Check failed: num_buffered_values_ >= > num_rows (20000 vs. 40000) > *** Check failure stack trace: *** > @ 0x4e9322c google::LogMessage::Fail() > @ 0x4e94ad1 google::LogMessage::SendToLog() > @ 0x4e92c06 google::LogMessage::Flush() > @ 0x4e961cd google::LogMessageFatal::~LogMessageFatal() > @ 0x2bfa2c3 impala::BaseScalarColumnReader::SkipTopLevelRows() > @ 0x2bf9fcc impala::BaseScalarColumnReader::StartPageFiltering() > @ 0x2bf99b4 impala::BaseScalarColumnReader::ReadDataPage() > @ 0x2bfbad8 impala::BaseScalarColumnReader::NextPage() > @ 0x2c5bc8c impala::ScalarColumnReader<>::ReadValueBatch<>() > @ 0x2c1a67a > impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch() > @ 0x2bae010 impala::HdfsParquetScanner::AssembleRows() > @ 0x2ba8934 impala::HdfsParquetScanner::GetNextInternal() > @ 0x2ba68ac impala::HdfsParquetScanner::ProcessSplit() > @ 0x27d8d0b impala::HdfsScanNode::ProcessSplit() > @ 0x27d7ee0 impala::HdfsScanNode::ScannerThread() > @ 0x27d723d > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv > @ 0x27d9831 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x1fc4d9b boost::function0<>::operator()() > @ 0x258590e impala::Thread::SuperviseThread() > @ 0x258db92 boost::_bi::list5<>::operator()<>() > @ 0x258dab6 boost::_bi::bind_t<>::operator()() > @ 0x258da79 boost::detail::thread_data<>::run() > @ 0x3db95c9 thread_proxy > @ 0x7febc66e6e24 start_thread > @ 0x7febc313135c __clone > Picked up JAVA_TOOL_OPTIONS: -Xms34359738368 -Xmx34359738368 > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/tmp/28ecfee554b03954bac9e77a73f4ce0c_pid2802027.hprof > Wrote minidump to /path/to/minidumps/74dae046-c19d-4ad5-ea2603ae-ff139f7e.dmp > {code} > > All parquet files are generated by spark with 128MB size of row group as > default configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org