Github user takuti commented on the issue: https://github.com/apache/incubator-hivemall/pull/82 @helenahm If I understand correctly, it sounds natural. Make sure the difference between "**null**" and "**empty**" document. **Null** document must not exist in your document collection (or you manually need to add the `AND ln != null` clause for workaround as you already tried; i.e. carefully undergo preprocessing step). In fact, since `doc#3` is **null**, the following query throws an exception which you wrote: ```sql with docs as ( select docid, doc from ( select 1 as docid, "Fruits and vegetables are healthy naâ¹ve." as doc union all select 2 as docid, "I like apples, oranges, and avocados. I do not like the flu or colds." as doc union all select 3 as docid, null as doc ) t1 ), word_counts as ( select docid, feature(word, count(word)) as f from docs t1 LATERAL VIEW explode(tokenize(doc, true)) t2 as word where not is_stopword(word) group by docid, word ) select label, word, avg(lambda) as lambda from ( select -- train_plsa(feature, "-topics 2 -eps 0.00001 -iter 2048 -alpha 0.01") as (label, word, lambda) train_lda(feature, "-topics 2 -iter 20") as (label, word, lambda) from ( select docid, collect_set(f) as feature from word_counts group by docid -- order by docid ) t1 ) t2 group by label, word order by lambda desc ; ``` However, if the document is just **empty**, it works: ```sql with docs as ( select docid, doc from ( select 1 as docid, "Fruits and vegetables are healthy naâ¹ve." as doc union all select 2 as docid, "I like apples, oranges, and avocados. I do not like the flu or colds." as doc union all select 3 as docid, "" as doc ) t1 ), ... ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---