Github user takuti commented on the issue:
https://github.com/apache/incubator-hivemall/pull/82
@helenahm If I understand correctly, it sounds natural. Make sure the
difference between "**null**" and "**empty**" document. **Null** document must
not exist in your document collection (or you manually need to add the `AND ln
!= null` clause for workaround as you already tried; i.e. carefully undergo
preprocessing step).
In fact, since `doc#3` is **null**, the following query throws an exception
which you wrote:
```sql
with docs as (
select docid, doc
from (
select 1 as docid, "Fruits and vegetables are healthy naâ¹ve." as doc
union all
select 2 as docid, "I like apples, oranges, and avocados. I do not like
the flu or colds." as doc
union all
select 3 as docid, null as doc
) t1
),
word_counts as (
select
docid,
feature(word, count(word)) as f
from docs t1 LATERAL VIEW explode(tokenize(doc, true)) t2 as word
where
not is_stopword(word)
group by
docid, word
)
select label, word, avg(lambda) as lambda
from (
select
-- train_plsa(feature, "-topics 2 -eps 0.00001 -iter 2048 -alpha 0.01")
as (label, word, lambda)
train_lda(feature, "-topics 2 -iter 20") as (label, word, lambda)
from (
select docid, collect_set(f) as feature
from word_counts
group by docid
-- order by docid
) t1
) t2
group by label, word
order by lambda desc
;
```
However, if the document is just **empty**, it works:
```sql
with docs as (
select docid, doc
from (
select 1 as docid, "Fruits and vegetables are healthy naâ¹ve." as doc
union all
select 2 as docid, "I like apples, oranges, and avocados. I do not like
the flu or colds." as doc
union all
select 3 as docid, "" as doc
) t1
),
...
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---