[jira] [Commented] (IMPALA-9815) Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000.xxxx during build

2020-10-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213622#comment-17213622
 ] 

ASF subversion and git services commented on IMPALA-9815:
-

Commit 481ea4ab0d476a4aa491f99c2a4e376faddc0b03 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=481ea4a ]

IMPALA-9815: Update URL for cdh-releases-rcs maven repo

We use repository.cloudera.com to get some Cloudera-patched
depdencies required by the CDP Hadoop dependencies (e.g.
log4j, logredactor, etc). The URL for repository.cloudera.com
has changed from repository.cloudera.com/content/* to
repository.cloudera.com/artifactory/*. It is possible that the
old URL will be restored. To get things working, this updates
the cdh-releases-rcs to the new URL.

It turns out that cdh-releases-rcs contains all the
artifacts that we would otherwise get from the third-party
repository, so this replaces third-party with cdh-releases-rcs.

Testing:
 - Ran build-all-options-ub1604

Change-Id: I438305565a1e6b7515408a701e9f9e31f7cfd679
Reviewed-on: http://gerrit.cloudera.org:8080/16594
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000. 
> during build
> -
>
> Key: IMPALA-9815
> URL: https://issues.apache.org/jira/browse/IMPALA-9815
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: mvn.1602463897.937399486.log
>
>
> This is an intermittent failure; sometimes 
> org.apache.hive:hive-exec:jar:3.1.3000 fails to be downloaded, breaking the 
> build. One telltale sign is a build failure happening early, at about 5 
> minutes into the build. The build error signature is:
> {code}
> 05:36:55 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: 
> Could not resolve dependencies for project 
> org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to 
> collect dependencies for [org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112 
> (compile)]: Failed to read artifact descriptor for 
> org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112: Could not transfer 
> artifact org.apache.hive:hive-exec:pom:3.1.3000.7.2.1.0-112 from/to 
> impala.cdh.repo 
> (https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven):
>  Access denied to: 
> https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven/org/apache/hive/hive-exec/3.1.3000.7.2.1.0-112/hive-exec-3.1.3000.7.2.1.0-112.pom,
>  ReasonPhrase:Forbidden. -> [Help 1]
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 05:36:55 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 05:36:55 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 05:36:55 mvn -U -s 
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala-auxiliary-tests/jenkins/m2-settings.xml
>  -U -B install -DskipTests exited with code 0
> 05:36:55 make[2]: *** [shaded-deps/CMakeFiles/shaded-deps] Error 1
> 05:36:55 make[1]: *** [shaded-deps/CMakeFiles/shaded-deps.dir/all] Error 2
> 05:36:55 make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10236) Queries Stuck if catalog topic update compression fails

2020-10-13 Thread Shant Hovsepian (Jira)
Shant Hovsepian created IMPALA-10236:


 Summary: Queries Stuck if catalog topic update compression fails
 Key: IMPALA-10236
 URL: https://issues.apache.org/jira/browse/IMPALA-10236
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 2.12.0
Reporter: Shant Hovsepian


If a compressed Catalog Object doesn't fit into a 2GB buffer, an error is 
thrown. 

 
{code:java}
/// Compresses a serialized catalog object using LZ4 and stores it back in 
'dst'. Stores
/// the size of the uncompressed catalog object in the first sizeof(uint32_t) 
bytes of
/// 'dst'. The compression fails if the uncompressed data size exceeds 
0x7E00 bytes.
Status CompressCatalogObject(const uint8_t* src, uint32_t size, std::string* 
dst)
WARN_UNUSED_RESULT;

{code}
 

CatalogServer::AddPendingTopicItem() calls CompressCatalogObject()

 
{code:java}
// Add a catalog update to pending_topic_updates_.
extern "C"
JNIEXPORT jboolean JNICALL
Java_org_apache_impala_service_FeSupport_NativeAddPendingTopicItem(JNIEnv* env,
jclass caller_class, jlong native_catalog_server_ptr, jstring key, jlong 
version,
jbyteArray serialized_object, jboolean deleted) {
  std::string key_string;
  {
JniUtfCharGuard key_str;
if (!JniUtfCharGuard::create(env, key, &key_str).ok()) {
  return static_cast(false);
}
key_string.assign(key_str.get());
  }
  JniScopedArrayCritical obj_buf;
  if (!JniScopedArrayCritical::Create(env, serialized_object, &obj_buf)) {
return static_cast(false);
  }
  reinterpret_cast(native_catalog_server_ptr)->
  AddPendingTopicItem(std::move(key_string), version, obj_buf.get(),
  static_cast(obj_buf.size()), deleted);
  return static_cast(true);
}

{code}
However the JNI call to AddPendingTopicItem disregards the return value.

Recently the return value was maintained due to IMPALA-10076:
{code:java}
-if (!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, v1Key,
-obj.catalog_version, data, delete)) {
+int actualSize = 
FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr,
+v1Key, obj.catalog_version, data, delete);
+if (actualSize < 0) {
   LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + 
", delete="
   + delete + ", data_size=" + data.length);
+} else if (summary != null && obj.type == HDFS_PARTITION) {
+  summary.update(true, delete, obj.hdfs_partition.partition_name,
+  obj.catalog_version, data.length, actualSize);
 }
   }
{code}
CatalogServiceCatalog::addCatalogObject() produces an error message but the 
Catalog update doesn't go through.
{code:java}
  if (topicMode_ == TopicMode.FULL || topicMode_ == TopicMode.MIXED) {
String v1Key = CatalogServiceConstants.CATALOG_TOPIC_V1_PREFIX + key;
byte[] data = serializer.serialize(obj);
int actualSize = 
FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr,
v1Key, obj.catalog_version, data, delete);
if (actualSize < 0) {
  LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + ", 
delete="
  + delete + ", data_size=" + data.length);
} else if (summary != null && obj.type == HDFS_PARTITION) {
  summary.update(true, delete, obj.hdfs_partition.partition_name,
  obj.catalog_version, data.length, actualSize);
}
  }

{code}
Not sure what the right behavior would be, we could handle the compression 
issue and try more aggressive compression, or unblock the catalog update.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows

2020-10-13 Thread Yida Wu (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213527#comment-17213527
 ] 

Yida Wu commented on IMPALA-10102:
--

The value of 'max_compressed_size' when it crashes is around 200-300MB, which 
is reasonable. The stack is always the same.

I have done some tests under different circumstances, if the memory resource is 
enough, such as running in a 256GB memory VM, the issue doesn't appear, if the 
mem_limit of the startup option is set to lower than needed, the query would 
fail without crashing.

In summary, the issue happens in memory scarcity situation, the process either 
being killed or crashes due to using a nullptr allocated from the pool. So we 
can use TryAllocate to avoid the crash, however the process is still probably 
killed later due to OOM and the efficiency can be lowered by using TryAllocate 
for most of the normal cases. Maybe consider it as a configuration issue is an 
other option, and the solution is to set a proper mem_limit option to avoid the 
OOM happens.

> Impalad crashses when writting a parquet file with large rows
> -
>
> Key: IMPALA-10102
> URL: https://issues.apache.org/jira/browse/IMPALA-10102
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Yida Wu
>Priority: Critical
>  Labels: crash
>
> Encountered a crash when testing following queries on my local branch:
> {code:sql}
> create table bigstrs3 stored as parquet as
> select *, repeat(uuid(), cast(random() * 20 as int)) as bigstr
> from functional.alltypes
> limit 1000;
> # Length of uuid() is 36. So the max row size is 7,200,000.
> set MAX_ROW_SIZE=8m;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs3 group by bigstr;
> create table my_cnt stored as parquet as
>   select count(*) as cnt, bigstr
>   from bigstrs3 group by bigstr;
> {code}
> The crash stacktrace:
> {code}
> Crash reason:  SIGSEGV
> Crash address: 0x0
> Process uptime: not available
> Thread 336 (crashed)
>  0  libc-2.23.so + 0x14e10b
>  1  impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned 
> long) [clone .localalias.0] + 0x1a 
>  2  impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 
>  3  impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned 
> long*) + 0x51 
>  4  impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char 
> const*, long*, unsigned char**) [compress.cc : 295 + 0x24]
>  5  impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, 
> int*, unsigned char**) [codec.cc : 211 + 0x41]
>  6  impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, 
> long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56]
>  7  impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() 
> [hdfs-parquet-table-writer.cc : 1330 + 0x60]
>  8  impalad!impala::HdfsParquetTableWriter::Finalize() 
> [hdfs-parquet-table-writer.cc : 1297 + 0x19]
>  9  
> impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, 
> impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e]
> 10  
> impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, 
> impala::RowBatch*, std::pair std::default_delete >, std::vector std::allocator > >*) [hdfs-table-sink.cc : 282 + 0x21]
> 11  impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, 
> impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e]
> 12  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 422 + 0x58]
> 13  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 106 + 0x16]
> 14  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 836 + 0x19]
> 15  impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() 
> const + 0x26 
> 16  
> impalad!boost::detail::function::void_function_obj_invoker0,
>  void>::invoke [function_template.hpp : 159 + 0xc] 
> 17  impalad!boost::function0::operator()() const [function_template.hpp 
> : 770 + 0x1d]
> 18  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [thread.cc : 360 + 0xf]
> 19  impalad!void 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::t

[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-13 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213496#comment-17213496
 ] 

Tim Armstrong commented on IMPALA-9884:
---

It looks like this happened on an exhaustive release build, so I'll loop to see 
if I can repro and collect logs.


> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10195) TestAdmissionControllerStress test_mem_limit run_admission_test looks flaky again

2020-10-13 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10195.

Resolution: Fixed

> TestAdmissionControllerStress test_mem_limit run_admission_test looks flaky 
> again
> -
>
> Key: IMPALA-10195
> URL: https://issues.apache.org/jira/browse/IMPALA-10195
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Bikramjeet Vig
>Priority: Blocker
>
> custom_cluster.test_admission_controller.TestAdmissionControllerStress.test_mem_limit
>  fails with a familiar assertion:
> {code}
> custom_cluster/test_admission_controller.py:1856: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1754: in run_admission_test
> assert metric_deltas['admitted'] >= expected_admitted
> E   assert 0 >= 5
> {code}
> The failure looks eerily similar to IMPALA-8342. That one was closed 18 
> months ago, so I'm hesitant to reopen it, but it is linked as related.
> Tagging [~bikramjeet.vig] and [~tarmstr...@cloudera.com] as the they looked 
> at the same flakiness at that time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9815) Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000.xxxx during build

2020-10-13 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213342#comment-17213342
 ] 

Tim Armstrong commented on IMPALA-9815:
---

I'm going to try and unblock by switching repos - 
https://gerrit.cloudera.org/#/c/16595/

> Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000. 
> during build
> -
>
> Key: IMPALA-9815
> URL: https://issues.apache.org/jira/browse/IMPALA-9815
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: mvn.1602463897.937399486.log
>
>
> This is an intermittent failure; sometimes 
> org.apache.hive:hive-exec:jar:3.1.3000 fails to be downloaded, breaking the 
> build. One telltale sign is a build failure happening early, at about 5 
> minutes into the build. The build error signature is:
> {code}
> 05:36:55 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: 
> Could not resolve dependencies for project 
> org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to 
> collect dependencies for [org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112 
> (compile)]: Failed to read artifact descriptor for 
> org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112: Could not transfer 
> artifact org.apache.hive:hive-exec:pom:3.1.3000.7.2.1.0-112 from/to 
> impala.cdh.repo 
> (https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven):
>  Access denied to: 
> https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven/org/apache/hive/hive-exec/3.1.3000.7.2.1.0-112/hive-exec-3.1.3000.7.2.1.0-112.pom,
>  ReasonPhrase:Forbidden. -> [Help 1]
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 05:36:55 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 05:36:55 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 05:36:55 mvn -U -s 
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala-auxiliary-tests/jenkins/m2-settings.xml
>  -U -B install -DskipTests exited with code 0
> 05:36:55 make[2]: *** [shaded-deps/CMakeFiles/shaded-deps] Error 1
> 05:36:55 make[1]: *** [shaded-deps/CMakeFiles/shaded-deps.dir/all] Error 2
> 05:36:55 make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213330#comment-17213330
 ] 

Tim Armstrong commented on IMPALA-10235:


I wonder if I broke this somehow with my recent profile refactor...

> Averaged timer profile counters can be negative for trivial queries
> ---
>
> Key: IMPALA-10235
> URL: https://issues.apache.org/jira/browse/IMPALA-10235
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: newbie, ramp-up
> Attachments: profile-output.txt
>
>
> Steps to reproduce on master:
> {code:java}
> stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
>  [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
> limit 25" -p > profile-output.txt
> ...
> Query: select sleep(100) from functional.alltypes limit 25
> Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
> http://stakiar-desktop:25000)
> Query progress can be monitored at: 
> http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
> Fetched 25 row(s) in 2.64s
> {code}
> Attached the contents of {{profile-output.txt}}
> Relevant portion of the profile:
> {code:java}
> Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
> 0.01%)
> ...
>- CompletionTime: -1665218428.000ns
> ...
>- TotalThreadsTotalWallClockTime: -1686005515.000ns
>  - TotalThreadsSysTime: 0.000ns
>  - TotalThreadsUserTime: 2.151ms
> ...
>- TotalTime: -1691524485.000ns
> {code}
> For whatever reason, this only affects the averaged fragment profile. For 
> this query, there was only one coordinator fragment and thus only one 
> fragment instance. The coordinator fragment instance showed normal timer 
> values:
> {code:java}
> Coordinator Fragment F00:
> ...
>  - CompletionTime: 2s629ms
> ...
>  - TotalThreadsTotalWallClockTime: 2s608ms
>- TotalThreadsSysTime: 0.000ns
>- TotalThreadsUserTime: 2.151ms
> ...
>  - TotalTime: 2s603ms
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9815) Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000.xxxx during build

2020-10-13 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213307#comment-17213307
 ] 

Joe McDonnell commented on IMPALA-9815:
---

Nevermind, for the maven plugins, we only list third-party and not the cdp s3 
location, so uploading won't help. We would need to change code to get this 
going. 
[https://github.com/apache/impala/blob/master/impala-parent/pom.xml#L220-L227]

> Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000. 
> during build
> -
>
> Key: IMPALA-9815
> URL: https://issues.apache.org/jira/browse/IMPALA-9815
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: mvn.1602463897.937399486.log
>
>
> This is an intermittent failure; sometimes 
> org.apache.hive:hive-exec:jar:3.1.3000 fails to be downloaded, breaking the 
> build. One telltale sign is a build failure happening early, at about 5 
> minutes into the build. The build error signature is:
> {code}
> 05:36:55 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: 
> Could not resolve dependencies for project 
> org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to 
> collect dependencies for [org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112 
> (compile)]: Failed to read artifact descriptor for 
> org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112: Could not transfer 
> artifact org.apache.hive:hive-exec:pom:3.1.3000.7.2.1.0-112 from/to 
> impala.cdh.repo 
> (https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven):
>  Access denied to: 
> https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven/org/apache/hive/hive-exec/3.1.3000.7.2.1.0-112/hive-exec-3.1.3000.7.2.1.0-112.pom,
>  ReasonPhrase:Forbidden. -> [Help 1]
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 05:36:55 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 05:36:55 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 05:36:55 mvn -U -s 
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala-auxiliary-tests/jenkins/m2-settings.xml
>  -U -B install -DskipTests exited with code 0
> 05:36:55 make[2]: *** [shaded-deps/CMakeFiles/shaded-deps] Error 1
> 05:36:55 make[1]: *** [shaded-deps/CMakeFiles/shaded-deps.dir/all] Error 2
> 05:36:55 make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9815) Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000.xxxx during build

2020-10-13 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213302#comment-17213302
 ] 

Joe McDonnell commented on IMPALA-9815:
---

Got past the cdh-release-rcs dependencies, but 
[https://repository.cloudera.com/content/repositories/third-party] is also 
missing. I uploaded cup-maven-plugin and czt-parent, trying again.

> Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000. 
> during build
> -
>
> Key: IMPALA-9815
> URL: https://issues.apache.org/jira/browse/IMPALA-9815
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: mvn.1602463897.937399486.log
>
>
> This is an intermittent failure; sometimes 
> org.apache.hive:hive-exec:jar:3.1.3000 fails to be downloaded, breaking the 
> build. One telltale sign is a build failure happening early, at about 5 
> minutes into the build. The build error signature is:
> {code}
> 05:36:55 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: 
> Could not resolve dependencies for project 
> org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to 
> collect dependencies for [org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112 
> (compile)]: Failed to read artifact descriptor for 
> org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112: Could not transfer 
> artifact org.apache.hive:hive-exec:pom:3.1.3000.7.2.1.0-112 from/to 
> impala.cdh.repo 
> (https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven):
>  Access denied to: 
> https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven/org/apache/hive/hive-exec/3.1.3000.7.2.1.0-112/hive-exec-3.1.3000.7.2.1.0-112.pom,
>  ReasonPhrase:Forbidden. -> [Help 1]
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 05:36:55 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 05:36:55 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 05:36:55 mvn -U -s 
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala-auxiliary-tests/jenkins/m2-settings.xml
>  -U -B install -DskipTests exited with code 0
> 05:36:55 make[2]: *** [shaded-deps/CMakeFiles/shaded-deps] Error 1
> 05:36:55 make[1]: *** [shaded-deps/CMakeFiles/shaded-deps.dir/all] Error 2
> 05:36:55 make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10235:
--
Description: 
Steps to reproduce on master:
{code:java}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}
Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:
{code:java}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}
For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. The coordinator fragment instance showed normal timer values:
{code:java}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}

  was:
Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}


> Averaged timer profile counters can be negative for trivial queries
> ---
>
> Key: IMPALA-10235
> URL: https://issues.apache.org/jira/browse/IMPALA-10235
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: newbie, ramp-up
> Attachments: profile-output.txt
>
>
> Steps to reproduce on master:
> {code:java}
> stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
>  [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
> limit 25" -p > profile-output.txt
> ...
> Query: select sleep(100) from functional.alltypes limit 25
> Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
> http://stakiar-desktop:25000)
> Query progress can be monitored at: 
> http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
> Fetched 25 row(s) in 2.64s
> {code}
> Attached the contents of {{profile-output.txt}}
> Relevant portion of the profile:
> {code:java}
> Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
> 0.01%)
> ...
>- CompletionTime: -1665218428.000ns
> ...
>- TotalThreadsTotalWallClockTime: -1686005515.000ns
>  - TotalThreadsSysTime: 0.000ns
>  - TotalThreadsUserTime: 2.151ms
> ...
>- TotalTime: -1691524485.000ns
> {code}
> For whatever reason, this only affects the averaged fragment profile. For 
> this query, there was only one coordinator fragment and thus only one 
> fragment instance. The coordinator fragment instance showed normal timer 
> values:
> {code:java}
> Coordinator Fragment F00:
> ...
>  - CompletionTime: 2s629ms
> ...
>  - TotalThreadsTotalWallClockTime: 2s608ms
>- TotalThreadsSysTime: 0.000ns
>- TotalThreadsUserTime: 2.151ms
> ...
>  - Tota

[jira] [Created] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10235:
-

 Summary: Averaged timer profile counters can be negative for 
trivial queries
 Key: IMPALA-10235
 URL: https://issues.apache.org/jira/browse/IMPALA-10235
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
 Attachments: profile-output.txt

Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10189) Avoid unnecessarily loading metadata for drop stats DDL

2020-10-13 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10189.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Avoid unnecessarily loading metadata for drop stats DDL
> ---
>
> Key: IMPALA-10189
> URL: https://issues.apache.org/jira/browse/IMPALA-10189
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0, 
> Impala 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Impala 4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213270#comment-17213270
 ] 

Tim Armstrong commented on IMPALA-10230:


I can reproduce on master when manually modifying the incremental stats. I 
tweaked the Metastore SQL a bit to work on my postgres database:
{noformat}
SELECT d."NAME",t."TBL_NAME",p.*,pp.*
FROM "PARTITIONS" p join
 "TBLS" t on p."TBL_ID" = t."TBL_ID" join
 "DBS" d on t."DB_ID" = d."DB_ID" join
 "PARTITION_PARAMS" pp on p."PART_ID"=pp."PART_ID"
WHERE d."NAME"='default'
  AND t."TBL_NAME"='test_column_stats';

update "PARTITION_PARAMS"
set 
"PARAM_VALUE"='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 
where "PARAM_KEY"='impala_intermediate_stats_chunk0'
  and "PART_ID" in (
SELECT p."PART_ID"
FROM "PARTITIONS" p join
 "TBLS" t on p."TBL_ID" = t."TBL_ID" join
 "DBS" d on t."DB_ID" = d."DB_ID"
WHERE d."NAME"='default'
  AND t."TBL_NAME"='test_column_stats');
{noformat}

{noformat}
[localhost.EXAMPLE.COM:21050] default>  Query: compute incremental stats 
test_column_stats partition(ds=20200107)
 >  ERROR: TableLoadingException: Failed to 
load metadata for table: default.test_column_stats
 >  CAUSED BY: IllegalStateException: 
ColumnStats{avgSize_=3.0, avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, n
umNulls_=-5} 
{noformat}

I think you're right that we should no longer write the bad stats after 
IMPALA-9699 is fixed, but it would be good if we could gracefully handle bad 
stats.

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10230:
---
Labels: newbie ramp-up  (was: )

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>  Labels: newbie, ramp-up
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9815) Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000.xxxx during build

2020-10-13 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213208#comment-17213208
 ] 

Joe McDonnell commented on IMPALA-9815:
---

Uploaded logredactor, java-cup, and java-cup-runtime. I'm trying 
all-build-options again.

> Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000. 
> during build
> -
>
> Key: IMPALA-9815
> URL: https://issues.apache.org/jira/browse/IMPALA-9815
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: mvn.1602463897.937399486.log
>
>
> This is an intermittent failure; sometimes 
> org.apache.hive:hive-exec:jar:3.1.3000 fails to be downloaded, breaking the 
> build. One telltale sign is a build failure happening early, at about 5 
> minutes into the build. The build error signature is:
> {code}
> 05:36:55 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: 
> Could not resolve dependencies for project 
> org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to 
> collect dependencies for [org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112 
> (compile)]: Failed to read artifact descriptor for 
> org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112: Could not transfer 
> artifact org.apache.hive:hive-exec:pom:3.1.3000.7.2.1.0-112 from/to 
> impala.cdh.repo 
> (https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven):
>  Access denied to: 
> https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven/org/apache/hive/hive-exec/3.1.3000.7.2.1.0-112/hive-exec-3.1.3000.7.2.1.0-112.pom,
>  ReasonPhrase:Forbidden. -> [Help 1]
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 05:36:55 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 05:36:55 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 05:36:55 mvn -U -s 
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala-auxiliary-tests/jenkins/m2-settings.xml
>  -U -B install -DskipTests exited with code 0
> 05:36:55 make[2]: *** [shaded-deps/CMakeFiles/shaded-deps] Error 1
> 05:36:55 make[1]: *** [shaded-deps/CMakeFiles/shaded-deps.dir/all] Error 2
> 05:36:55 make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9815) Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000.xxxx during build

2020-10-13 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213201#comment-17213201
 ] 

Joe McDonnell commented on IMPALA-9815:
---

Not too long ago for IMPALA-10218, I parsed the maven logs to get the artifacts 
coming from that repo. Here they are:
{noformat}
org.cloudera.logredactor: logredactor-2.0.8.jar
org.codehaus.jackson.jackson-mapper-asl: 
jackson-mapper-asl-1.9.13-cloudera.1.jar
log4j: log4j-1.2.17-cloudera1.jar
org.codehaus.jackson.jackson-core-asl: jackson-core-asl-1.9.13-cloudera.1.jar
net.sourceforge.czt.dev.java-cup-runtime: java-cup-runtime-0.11-a-czt01-cdh.jar
net.sourceforce.cztv.dev.java-cup: java-cup-0.11-a-czt02-cdh.jar{noformat}

> Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000. 
> during build
> -
>
> Key: IMPALA-9815
> URL: https://issues.apache.org/jira/browse/IMPALA-9815
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: mvn.1602463897.937399486.log
>
>
> This is an intermittent failure; sometimes 
> org.apache.hive:hive-exec:jar:3.1.3000 fails to be downloaded, breaking the 
> build. One telltale sign is a build failure happening early, at about 5 
> minutes into the build. The build error signature is:
> {code}
> 05:36:55 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: 
> Could not resolve dependencies for project 
> org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to 
> collect dependencies for [org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112 
> (compile)]: Failed to read artifact descriptor for 
> org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112: Could not transfer 
> artifact org.apache.hive:hive-exec:pom:3.1.3000.7.2.1.0-112 from/to 
> impala.cdh.repo 
> (https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven):
>  Access denied to: 
> https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven/org/apache/hive/hive-exec/3.1.3000.7.2.1.0-112/hive-exec-3.1.3000.7.2.1.0-112.pom,
>  ReasonPhrase:Forbidden. -> [Help 1]
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 05:36:55 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 05:36:55 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 05:36:55 mvn -U -s 
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala-auxiliary-tests/jenkins/m2-settings.xml
>  -U -B install -DskipTests exited with code 0
> 05:36:55 make[2]: *** [shaded-deps/CMakeFiles/shaded-deps] Error 1
> 05:36:55 make[1]: *** [shaded-deps/CMakeFiles/shaded-deps.dir/all] Error 2
> 05:36:55 make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213127#comment-17213127
 ] 

logan.zheng commented on IMPALA-10230:
--

It is very likely to be resolved in 
[链接标题|http://example.com]https://issues.apache.org/jira/browse/IMPALA-9699
 

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213127#comment-17213127
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:46 PM:
-

It is very likely to be resolved in 
https://issues.apache.org/jira/browse/IMPALA-9699
  
 


was (Author: loganzheng):
It is very likely to be resolved in 
[链接标题|http://example.com]https://issues.apache.org/jira/browse/IMPALA-9699
 

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:42 PM:
-

reproduce this issue  on  asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
{code:java}
//代码占位符test_column_stats tab_id =92746 
 SELECT d.`NAME`,t.`TBL_NAME`,p.,pp. FROM `PARTITIONS` p,`TBLS` t,`DBS` 
d,partition_params pp WHERE d.`NAME`='default' AND 
t.`TBL_NAME`='test_column_stats' and p.PART_ID=pp.PART_ID and p.TBL_ID=92746   
{code}
 * 
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

{code:java}
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 
where PARAM_KEY='impala_intermediate_stats_chunk0'
{code}
 
{code:java}
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
{code}
 
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 6. execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
  
{code:java}
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code}
 
{code:java}
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.impala.catalog.Table.lo

[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:41 PM:
-

reproduce this issue  on  asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
{code:java}
//代码占位符test_column_stats tab_id =92746 
 SELECT d.`NAME`,t.`TBL_NAME`,p.,pp. FROM `PARTITIONS` p,`TBLS` t,`DBS` 
d,partition_params pp WHERE d.`NAME`='default' AND 
t.`TBL_NAME`='test_column_stats' and p.PART_ID=pp.PART_ID and p.TBL_ID=92746  
{code}
 * 
h5.  
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

{code:java}
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 
where PARAM_KEY='impala_intermediate_stats_chunk0'
{code}
 
{code:java}
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
{code}
 
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 6. execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
  
{code:java}
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code}
 
{code:java}
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.impala.catalog.Tab

[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:40 PM:
-

reproduce this issue  on  asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
{code:java}
//代码占位符test_column_stats tab_id =92746  SELECT d.`NAME`,t.`TBL_NAME`,p.,pp. 
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp WHERE 
d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' and 
p.PART_ID=pp.PART_ID and p.TBL_ID=92746 
{code}
 # 
{code:java}
 {code}
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 
where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

 
{code:java}
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
{code}
 
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 6. execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
  
{code:java}
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code}
 
{code:java}
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.i

[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:39 PM:
-

reproduce this issue  on  asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
 # 
{code:java}
//代码占位符test_column_stats tab_id =92746
 SELECT d.`NAME`,t.`TBL_NAME`,p.,pp.
 FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746 {code}
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 
where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

 
{code:java}
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
{code}
 
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 6. execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
 
{code:java}
//代码占位符[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code}
 
{code:java}
//代码占位符
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.impa

[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:39 PM:
-

reproduce this issue  on  asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
{code:java}
//代码占位符test_column_stats tab_id =92746  SELECT d.`NAME`,t.`TBL_NAME`,p.,pp. 
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp WHERE 
d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' and 
p.PART_ID=pp.PART_ID and p.TBL_ID=92746 
{code}
 # 
{code:java}
 {code}
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 
where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

 
{code:java}
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
{code}
 
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 6. execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
  
{code:java}
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code}
 
{code:java}
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.i

[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:37 PM:
-

h2. reproduce this issue

asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
 # 
{code:java}
//代码占位符test_column_stats tab_id =92746
 SELECT d.`NAME`,t.`TBL_NAME`,p.,pp.
 FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746 {code}
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

 
{code:java}
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
{code}
 
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 6. execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
 [localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code:java}
//代码占位符
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.impala.catalog.Table.loadAllColum

[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:36 PM:
-

h2. reproduce this issue

asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
 # test_column_stats tab_id =92746
 SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
 FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746 
  
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

 // Intermediate state for the computation of per-column stats. Impala can 
aggregate these
 // structures together to produce final stats for a column.
 struct TIntermediateColumnStats

{ // One byte for each bucket of the NDV HLL computation 1: optional binary 
intermediate_ndv // If true, intermediate_ndv is RLE-compressed 2: optional 
bool is_ndv_encoded // Number of nulls seen so far (or -1 if nulls are not 
counted) 3: optional i64 num_nulls // The maximum width, in bytes, of the 
column 4: optional i32 max_width // The average width (in bytes) of the column 
5: optional double avg_width // The number of rows counted, needed to compute 
NDVs from intermediate_ndv 6: optional i64 num_rows }

// Per-partition statistics
 struct TPartitionStats

{ // Number of rows gathered per-partition by non-incremental stats. // TODO: 
This can probably be removed in favour of the intermediate_col_stats, but doing 
// so would interfere with the non-incremental stats path 1: required 
TTableStats stats // Intermediate state for incremental statistics, one entry 
per column name. 2: optional map 
intermediate_col_stats }

 
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 6. execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
 [localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code:java}
//代码占位符
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:376)
at org.apache.impala.catalog.HdfsTab

[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:35 PM:
-

h2. reproduce this issue

asf  impala 3.4
h3. 1. create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 2 create data

insert overwrite table test_column_stats partition(ds=20200101)
 select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
 select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
 select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 3 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
 compute incremental stats test_column_stats partition(ds=20200103);
 compute incremental stats test_column_stats partition(ds=20200104);
h3. 4 update metastore
 # test_column_stats tab_id =92746
 SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
 FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746 
  
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. PARAM_VALUE中序列化了TPartitionStats对象 关键点设置了num_nulls=-1

 // Intermediate state for the computation of per-column stats. Impala can 
aggregate these
 // structures together to produce final stats for a column.
 struct TIntermediateColumnStats {
 // One byte for each bucket of the NDV HLL computation
 1: optional binary intermediate_ndv

// If true, intermediate_ndv is RLE-compressed
 2: optional bool is_ndv_encoded

// Number of nulls seen so far (or -1 if nulls are not counted)
 3: optional i64 num_nulls

// The maximum width, in bytes, of the column
 4: optional i32 max_width

// The average width (in bytes) of the column
 5: optional double avg_width

// The number of rows counted, needed to compute NDVs from intermediate_ndv
 6: optional i64 num_rows
 }

// Per-partition statistics
 struct TPartitionStats {
 // Number of rows gathered per-partition by non-incremental stats.
 // TODO: This can probably be removed in favour of the intermediate_col_stats, 
but doing
 // so would interfere with the non-incremental stats path
 1: required TTableStats stats

// Intermediate state for incremental statistics, one entry per column name.
 2: optional map intermediate_col_stats
 }
  
h3. 5. restart catalog and coordinator

clear then table partition cache
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#6-execute-compute-incremental-stats]6.
 execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
 then will see exception
 [localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
 Query: compute incremental stats test_column_stats partition(ds=20200107)
 ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
 CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code:java}
//代码占位符
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:173)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=2, numNulls_=-2}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:454)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:287)
at org.apache.impala.catalog.Column.updateStats(Column.java:71)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:159)
  

[jira] [Commented] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213126#comment-17213126
 ] 

logan.zheng commented on IMPALA-10230:
--

h2. reproduce this issue

asf  impala 3.4
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#1-create-table]1
 create table

create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#2-create-data]2
 create data

insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#3-compute-increment-stats]3
 compute increment stats

compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#4-update-metastore]4
 update metastore
# test_column_stats tab_id =92746
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE  d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID  
 and p.TBL_ID=92746 
 
{code:java}
 
update PARTITION_PARAMS set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
 where PARAM_KEY='impala_intermediate_stats_chunk0'
 
{code}
h5. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#param_value%E4%B8%AD%E5%BA%8F%E5%88%97%E5%8C%96%E4%BA%86tpartitionstats%E5%AF%B9%E8%B1%A1-%E5%85%B3%E9%94%AE%E7%82%B9num_nulls-1]PARAM_VALUE中序列化了TPartitionStats对象
 关键点设置了num_nulls=-1
 // Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats \{
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats \{
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
 
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#5-restart-catalog-and-coordinator]5.
 restart catalog and coordinator

clear then table partition cache
h3. 
[|https://git.code.oa.com/beacon/beancon-olap/apache-impala/issues/10#6-execute-compute-incremental-stats]6.
 execute compute incremental stats

compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5} 
{code:java}
//代码占位符
I1013 20:16:51.701009 1840603 HdfsTable.java:980] Reloading metadata for table 
definition and all partition(s) of default.test_column_stats (ALTER TABLE 
UPDATE_STATS)
I1013 20:16:51.851312 1840603 jni-util.cc:288] 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.test_column_stats
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1032)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:935)
at 
org.apache.impala.service.CatalogOpExecutor.alterTable(CatalogOpExecutor.java:848)
at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:358)
at

[jira] [Issue Comment Deleted] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

logan.zheng updated IMPALA-10230:
-
Comment: was deleted

(was: reproduce this issue
impala 3.3+

1 create table
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;

2 create data
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);

4 update metastore
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE  d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID  
 and p.TBL_ID=92746
update PARTITION_PARAMS
set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
where  PARAM_KEY='impala_intermediate_stats_chunk0'
PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
5. restart catalog and coordinator
clear then table partition cache

6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception

[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5})

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213120#comment-17213120
 ] 

logan.zheng commented on IMPALA-10230:
--

reproduce this issue
impala 3.3+

1 create table
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;

2 create data
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);

4 update metastore
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE  d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID  
 and p.TBL_ID=92746
update PARTITION_PARAMS
set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
where  PARAM_KEY='impala_intermediate_stats_chunk0'
PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
  // One byte for each bucket of the NDV HLL computation
  1: optional binary intermediate_ndv

  // If true, intermediate_ndv is RLE-compressed
  2: optional bool is_ndv_encoded

  // Number of nulls seen so far (or -1 if nulls are not counted)
  3: optional i64 num_nulls

  // The maximum width, in bytes, of the column
  4: optional i32 max_width

  // The average width (in bytes) of the column
  5: optional double avg_width

  // The number of rows counted, needed to compute NDVs from intermediate_ndv
  6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
  // Number of rows gathered per-partition by non-incremental stats.
  // TODO: This can probably be removed in favour of the 
intermediate_col_stats, but doing
  // so would interfere with the non-incremental stats path
  1: required TTableStats stats

  // Intermediate state for incremental statistics, one entry per column name.
  2: optional map intermediate_col_stats
}
5. restart catalog and coordinator
clear then table partition cache

6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception

[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Issue Comment Deleted] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

logan.zheng updated IMPALA-10230:
-
Comment: was deleted

(was: ## reproduce this issue
impala 3.3+
### 1 create table 
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
### 2 create data 
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;


insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

### 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);


### 4 update metastore 
```
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746
```
```
update PARTITION_PARAMS
set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
```
# PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1

```
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
 // One byte for each bucket of the NDV HLL computation
 1: optional binary intermediate_ndv

// If true, intermediate_ndv is RLE-compressed
 2: optional bool is_ndv_encoded

// Number of nulls seen so far (or -1 if nulls are not counted)
 3: optional i64 num_nulls

// The maximum width, in bytes, of the column
 4: optional i32 max_width

// The average width (in bytes) of the column
 5: optional double avg_width

// The number of rows counted, needed to compute NDVs from intermediate_ndv
 6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
 // Number of rows gathered per-partition by non-incremental stats.
 // TODO: This can probably be removed in favour of the intermediate_col_stats, 
but doing
 // so would interfere with the non-incremental stats path
 1: required TTableStats stats

// Intermediate state for incremental statistics, one entry per column name.
 2: optional map intermediate_col_stats
}

```

### 5. restart catalog and coordinator
clear then table partition cache

### 6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
```
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}
```)

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213116#comment-17213116
 ] 

logan.zheng edited comment on IMPALA-10230 at 10/13/20, 1:28 PM:
-

## reproduce this issue
impala 3.3+
### 1 create table 
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
### 2 create data 
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;


insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

### 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);


### 4 update metastore 
```
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746
```
```
update PARTITION_PARAMS
set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
```
# PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1

```
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
 // One byte for each bucket of the NDV HLL computation
 1: optional binary intermediate_ndv

// If true, intermediate_ndv is RLE-compressed
 2: optional bool is_ndv_encoded

// Number of nulls seen so far (or -1 if nulls are not counted)
 3: optional i64 num_nulls

// The maximum width, in bytes, of the column
 4: optional i32 max_width

// The average width (in bytes) of the column
 5: optional double avg_width

// The number of rows counted, needed to compute NDVs from intermediate_ndv
 6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
 // Number of rows gathered per-partition by non-incremental stats.
 // TODO: This can probably be removed in favour of the intermediate_col_stats, 
but doing
 // so would interfere with the non-incremental stats path
 1: required TTableStats stats

// Intermediate state for incremental statistics, one entry per column name.
 2: optional map intermediate_col_stats
}

```

### 5. restart catalog and coordinator
clear then table partition cache

### 6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
```
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}
```


was (Author: loganzheng):
## reproduce this issue
impala 3.3+
### 1 create table 
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
### 2 create data 
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;


insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

### 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);


### 4 update metastore 
```
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746
```
```
update PARTITION_PARAMS
set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
```
# PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1

```
// Intermediate state for the computation of per-c

[jira] [Commented] (IMPALA-10230) column stats num_nulls less than -1

2020-10-13 Thread logan.zheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213116#comment-17213116
 ] 

logan.zheng commented on IMPALA-10230:
--

## reproduce this issue
impala 3.3+
### 1 create table 
create table test_column_stats(str1 string,str2 string,int1 int) PARTITIONED BY 
(ds int) STORED AS PARQUET;
### 2 create data 
insert overwrite table test_column_stats partition(ds=20200101)
select 'tt' str1 ,'20200101' as str2 ,1 as int1;


insert overwrite table test_column_stats partition(ds=20200103)
select 'tt2' str1 ,'20200103' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200104)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

insert overwrite table test_column_stats partition(ds=20200105)
select 'tt2' str1 ,'20200104' as str2 ,1 as int1;

### 3 compute increment stats
compute incremental stats test_column_stats partition(ds=20200101);
compute incremental stats test_column_stats partition(ds=20200103);
compute incremental stats test_column_stats partition(ds=20200104);


### 4 update metastore 
```
SELECT d.`NAME`,t.`TBL_NAME`,p.*,pp.*
FROM `PARTITIONS` p,`TBLS` t,`DBS` d,partition_params pp 
 WHERE d.`NAME`='default' AND t.`TBL_NAME`='test_column_stats' 
 and p.PART_ID=pp.PART_ID 
 and p.TBL_ID=92746
```
```
update PARTITION_PARAMS
set 
PARAM_VALUE='HBYCABsDjARpbnQxGAz/AK4AAAH/AP8ATwARFgAVABcQQBYCAARzdHIxGAj/AP8A/wD/ABEWARUAFwAAFgAABHN0cjIYDP8A/wD/AAH9ABEWABUQFwAAACBAFgIAAA=='
where PARAM_KEY='impala_intermediate_stats_chunk0'
```
# PARAM_VALUE中序列化了TPartitionStats对象 关键点num_nulls=-1

```
// Intermediate state for the computation of per-column stats. Impala can 
aggregate these
// structures together to produce final stats for a column.
struct TIntermediateColumnStats {
 // One byte for each bucket of the NDV HLL computation
 1: optional binary intermediate_ndv

// If true, intermediate_ndv is RLE-compressed
 2: optional bool is_ndv_encoded

// Number of nulls seen so far (or -1 if nulls are not counted)
 3: optional i64 num_nulls

// The maximum width, in bytes, of the column
 4: optional i32 max_width

// The average width (in bytes) of the column
 5: optional double avg_width

// The number of rows counted, needed to compute NDVs from intermediate_ndv
 6: optional i64 num_rows
}

// Per-partition statistics
struct TPartitionStats {
 // Number of rows gathered per-partition by non-incremental stats.
 // TODO: This can probably be removed in favour of the intermediate_col_stats, 
but doing
 // so would interfere with the non-incremental stats path
 1: required TTableStats stats

// Intermediate state for incremental statistics, one entry per column name.
 2: optional map intermediate_col_stats
}

```

### 5. restart catalog and coordinator
clear then table partition cache

### 6. execute compute incremental stats
compute incremental stats test_column_stats partition(ds=20200105);
then will see exception
```
[localhost:21000] default> compute incremental stats test_column_stats 
partition(ds=20200105);
Query: compute incremental stats test_column_stats partition(ds=20200107)
ERROR: TableLoadingException: Failed to load metadata for table: 
default.test_column_stats
CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=3.0, 
avgSerializedSize_=15.0, maxSize_=3, numDistinct_=1, numNulls_=-5}
```

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10055) DCHECK was hit while executing e2e test TestQueries::test_subquery

2020-10-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213079#comment-17213079
 ] 

Zoltán Borók-Nagy commented on IMPALA-10055:


We hit this DCHECK on corrupt ORC files, so I've converted the DCHECK to an 
error: https://gerrit.cloudera.org/#/c/16591/

> DCHECK was hit while executing e2e test TestQueries::test_subquery
> --
>
> Key: IMPALA-10055
> URL: https://issues.apache.org/jira/browse/IMPALA-10055
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
>  Labels: broken-build, crash, flaky
> Fix For: Impala 4.0
>
>
> A DCHECK was hit while executing e2e test. Time frame suggests that it 
> possibly happened while executing TestQueries::test_subquery:
> {code}
> query_test/test_queries.py:149: in test_subquery
> self.run_test_case('QueryTest/subquery', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:334: in execute
> r = self.__fetch_results(handle, profile_format=profile_format)
> common/impala_connection.py:436: in __fetch_results
> result_tuples = cursor.fetchall()
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:532:
>  in fetchall
> self._wait_to_finish()
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:405:
>  in _wait_to_finish
> resp = self._last_operation._rpc('GetOperationStatus', req)
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:992:
>  in _rpc
> response = self._execute(func_name, request)
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:1023:
>  in _execute
> .format(self.retries))
> E   HiveServer2Error: Failed after retrying 3 times
> {code}
> impalad log:
> {code}
> Log file created at: 2020/08/05 17:34:30
> Running on machine: 
> impala-ec2-centos74-m5-4xlarge-ondemand-18a5.vpc.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0805 17:34:30.003247 10887 orc-column-readers.cc:423] 
> c34e87376f496a53:7ba6a2e40002] Check failed: (scanner_->row_batches_nee
> d_validation_ && scanner_->scan_node_->IsZeroSlotTableScan()) || 
> scanner_->acid_original_file
> {code}
> Stack trace:
> {code}
> CORE: ./fe/core.1596674070.14179.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7efd6ec6e1f7 in raise () from /lib64/libc.so.6
> To enable execution of this file add
>   add-auto-load-safe-path 
> /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib64/libstdc++.so.6.0.24-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
>   set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
>   info "(gdb)Auto-loading safe path"
> #0  0x7efd6ec6e1f7 in raise () from /lib64/libc.so.6
> #1  0x7efd6ec6f8e8 in abort () from /lib64/libc.so.6
> #2  0x086b8ea4 in google::DumpStackTraceAndExit() ()
> #3  0x086ae25d in google::LogMessage::Fail() ()
> #4  0x086afb4d in google::LogMessage::SendToLog() ()
> #5  0x086adbbb in google::LogMessage::Flush() ()
> #6  0x086b17b9 in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x0388e10a in impala::OrcStructReader::TopLevelReadValueBatch 
> (this=0x61162630, scratch_batch=0x824831e0, pool=0x82483258) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/orc-column-readers.cc:421
> #8  0x03810c92 in impala::HdfsOrcScanner::TransferTuples 
> (this=0x27143c00, dst_batch=0x2e5ca820) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdf

[jira] [Commented] (IMPALA-9815) Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000.xxxx during build

2020-10-13 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212883#comment-17212883
 ] 

Quanlong Huang commented on IMPALA-9815:


Still hit the same issue on org.cloudera.logredactor:logredactor:jar:2.0.8: 
[https://jenkins.impala.io/job/all-build-options-ub1604/6494]
{code:java}
[ERROR] Failed to execute goal on project impala-minimal-hive-exec: Could not 
resolve dependencies for project 
org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to collect 
dependencies at org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-287 -> 
org.cloudera.logredactor:logredactor:jar:2.0.8: Failed to read artifact 
descriptor for org.cloudera.logredactor:logredactor:jar:2.0.8: Could not 
transfer artifact org.cloudera.logredactor:logredactor:pom:2.0.8 from/to 
impala.cdp.repo 
(https://native-toolchain.s3.amazonaws.com/build/cdp_components/4493826/maven): 
Access denied to: 
https://native-toolchain.s3.amazonaws.com/build/cdp_components/4493826/maven/org/cloudera/logredactor/logredactor/2.0.8/logredactor-2.0.8.pom
 , ReasonPhrase:Forbidden. -> [Help 1]{code}

> Intermittent failure downloading org.apache.hive:hive-exec:jar:3.1.3000. 
> during build
> -
>
> Key: IMPALA-9815
> URL: https://issues.apache.org/jira/browse/IMPALA-9815
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Laszlo Gaal
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: mvn.1602463897.937399486.log
>
>
> This is an intermittent failure; sometimes 
> org.apache.hive:hive-exec:jar:3.1.3000 fails to be downloaded, breaking the 
> build. One telltale sign is a build failure happening early, at about 5 
> minutes into the build. The build error signature is:
> {code}
> 05:36:55 [ERROR] Failed to execute goal on project impala-minimal-hive-exec: 
> Could not resolve dependencies for project 
> org.apache.impala:impala-minimal-hive-exec:jar:0.1-SNAPSHOT: Failed to 
> collect dependencies for [org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112 
> (compile)]: Failed to read artifact descriptor for 
> org.apache.hive:hive-exec:jar:3.1.3000.7.2.1.0-112: Could not transfer 
> artifact org.apache.hive:hive-exec:pom:3.1.3000.7.2.1.0-112 from/to 
> impala.cdh.repo 
> (https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven):
>  Access denied to: 
> https://native-toolchain.s3.amazonaws.com/build/cdh_components/1814051/maven/org/apache/hive/hive-exec/3.1.3000.7.2.1.0-112/hive-exec-3.1.3000.7.2.1.0-112.pom,
>  ReasonPhrase:Forbidden. -> [Help 1]
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> 05:36:55 [ERROR] Re-run Maven using the -X switch to enable full debug 
> logging.
> 05:36:55 [ERROR] 
> 05:36:55 [ERROR] For more information about the errors and possible 
> solutions, please read the following articles:
> 05:36:55 [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 05:36:55 mvn -U -s 
> /data/jenkins/workspace/impala-cdpd-master-core/repos/Impala-auxiliary-tests/jenkins/m2-settings.xml
>  -U -B install -DskipTests exited with code 0
> 05:36:55 make[2]: *** [shaded-deps/CMakeFiles/shaded-deps] Error 1
> 05:36:55 make[1]: *** [shaded-deps/CMakeFiles/shaded-deps.dir/all] Error 2
> 05:36:55 make[1]: *** Waiting for unfinished jobs
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org