[ https://issues.apache.org/jira/browse/IMPALA-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-7214 stopped by Alex Rodoni. ------------------------------------------- > Lots of misleading/incorrect use of DataNode in Impala docs > ----------------------------------------------------------- > > Key: IMPALA-7214 > URL: https://issues.apache.org/jira/browse/IMPALA-7214 > Project: IMPALA > Issue Type: Bug > Components: Docs > Affects Versions: Impala 2.12.0 > Reporter: Tim Armstrong > Assignee: Alex Rodoni > Priority: Major > > The docs tend to conflate DataNodes (a HDFS service) and Impala daemons. I > think this stems from the original deployment practice of always colocating > Impala daemons with HDFS datanodes so that HDFS data could always be read > from a local DataNode. > I'm a bit pedantic so the conflation feels wrong to me regardless, but I > think this will become increasingly confusing as alternative deployments > without colocated HDFS DataNodes become more common (e.g. running against S3, > running with a separate HDFS service). > E.g. picking an example at random: > {noformat} > In Impala 1.4.0 and higher, the <codeph>LIMIT</codeph> clause is now > optional (rather than required) for > queries that use the <codeph>ORDER BY</codeph> clause. Impala > automatically uses a temporary disk work area > to perform the sort if the sort operation would otherwise exceed the > Impala memory limit for a particular > DataNode. > {noformat} > This is wrong because the memory limit is for an Impala daemon, which is the > process that does the actual sorting. So here I think it should be "Impala > daemon" instead of "DataNode". -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org