fmcquillan99 commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA URL: https://github.com/apache/madlib/pull/432#issuecomment-547026871 (4) Unnecessary verbose output ``` DROP TABLE IF EXISTS documents_tf, documents_tf_vocabulary; SELECT madlib.term_frequency('documents', -- input table 'docid', -- document id column 'words', -- vector of words in document 'documents_tf', -- output documents table with term frequency TRUE); -- TRUE to created vocabulary table ``` produces ``` NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry. CONTEXT: SQL statement " CREATE TABLE documents_tf_vocabulary AS SELECT (row_number() OVER (order by word))::INTEGER - 1 as wordid, word::TEXT FROM ( SELECT distinct(words) as word FROM ( SELECT unnest(words::TEXT[]) as words FROM documents ) q1 ) q2 " PL/Python function "term_frequency" NOTICE: One or more columns in the following table(s) do not have statistics: documents HINT: For non-partitioned tables, run analyze <table_name>(<column_list>). For partitioned tables, run analyze rootpartition <table_name>(<column_list>). See log for columns missing statistics. CONTEXT: SQL statement " CREATE TABLE documents_tf_vocabulary AS SELECT (row_number() OVER (order by word))::INTEGER - 1 as wordid, word::TEXT FROM ( SELECT distinct(words) as word FROM ( SELECT unnest(words::TEXT[]) as words FROM documents ) q1 ) q2 " PL/Python function "term_frequency" NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'docid' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. CONTEXT: SQL statement " CREATE TABLE documents_tf( docid INTEGER, wordid INTEGER, count INTEGER ) " PL/Python function "term_frequency" NOTICE: One or more columns in the following table(s) do not have statistics: documents HINT: For non-partitioned tables, run analyze <table_name>(<column_list>). For partitioned tables, run analyze rootpartition <table_name>(<column_list>). See log for columns missing statistics. CONTEXT: SQL statement " INSERT INTO documents_tf SELECT docid, w.wordid as wordid, word_count as count FROM ( SELECT docid, word::TEXT, count(*) as word_count FROM ( SELECT docid, unnest(words::TEXT[]) as word FROM documents WHERE docid IS NOT NULL ) q1 GROUP BY docid, word ) q2 , documents_tf_vocabulary as w WHERE q2.word = w.word " PL/Python function "term_frequency" term_frequency ------------------------------------------------------------------------------------------ Term frequency output in table documents_tf, vocabulary in table documents_tf_vocabulary (1 row) Time: 206.233 ms ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
