[GitHub] [madlib] fmcquillan99 commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA

GitBox Mon, 28 Oct 2019 09:23:20 -0700

fmcquillan99 commented on issue #432: MADLIB-1351 : Added stopping criteria on 
perplexity to LDA
URL: https://github.com/apache/madlib/pull/432#issuecomment-547026871
 
 
   (4)
   Unnecessary verbose output
   
   ```
   DROP TABLE IF EXISTS documents_tf, documents_tf_vocabulary;
   
   SELECT madlib.term_frequency('documents',    -- input table
                                'docid',        -- document id column
                                'words',        -- vector of words in document
                                'documents_tf', -- output documents table with 
term frequency
                                TRUE);          -- TRUE to created vocabulary 
table
   ```
   produces
   ```
   NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy 
entry.
   CONTEXT:  SQL statement "
                    CREATE TABLE documents_tf_vocabulary AS
                    SELECT (row_number() OVER (order by word))::INTEGER - 1 as 
wordid,
                           word::TEXT
                    FROM (
                       SELECT distinct(words) as word
                       FROM (
                             SELECT unnest(words::TEXT[]) as words
                             FROM documents
                       ) q1
                   ) q2
                   "
   PL/Python function "term_frequency"
   NOTICE:  One or more columns in the following table(s) do not have 
statistics: documents
   HINT:  For non-partitioned tables, run analyze <table_name>(<column_list>). 
For partitioned tables, run analyze rootpartition <table_name>(<column_list>). 
See log for columns missing statistics.
   CONTEXT:  SQL statement "
                    CREATE TABLE documents_tf_vocabulary AS
                    SELECT (row_number() OVER (order by word))::INTEGER - 1 as 
wordid,
                           word::TEXT
                    FROM (
                       SELECT distinct(words) as word
                       FROM (
                             SELECT unnest(words::TEXT[]) as words
                             FROM documents
                       ) q1
                   ) q2
                   "
   PL/Python function "term_frequency"
   NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 
'docid' as the Greenplum Database data distribution key for this table.
   HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make 
sure column(s) chosen are the optimal data distribution key to minimize skew.
   CONTEXT:  SQL statement "
           CREATE TABLE documents_tf(
               docid INTEGER,
               wordid INTEGER,
               count INTEGER
           )
           "
   PL/Python function "term_frequency"
   NOTICE:  One or more columns in the following table(s) do not have 
statistics: documents
   HINT:  For non-partitioned tables, run analyze <table_name>(<column_list>). 
For partitioned tables, run analyze rootpartition <table_name>(<column_list>). 
See log for columns missing statistics.
   CONTEXT:  SQL statement "
           INSERT INTO documents_tf
               SELECT docid, w.wordid as wordid, word_count as count
               FROM (
                   SELECT docid, word::TEXT, count(*) as word_count
                   FROM
                   (
                       SELECT docid, unnest(words::TEXT[]) as word
                       FROM documents
                       WHERE
                           docid IS NOT NULL
                   ) q1
                   GROUP BY docid, word
               ) q2
               
               , documents_tf_vocabulary as w
               WHERE
                   q2.word = w.word
               
           "
   PL/Python function "term_frequency"
                                         term_frequency                         
             
   
------------------------------------------------------------------------------------------
    Term frequency output in table documents_tf, vocabulary in table 
documents_tf_vocabulary
   (1 row)
   
   Time: 206.233 ms
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [madlib] fmcquillan99 commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA

Reply via email to