[GitHub] [madlib] fmcquillan99 edited a comment on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA

GitBox Fri, 01 Nov 2019 12:03:10 -0700

fmcquillan99 edited a comment on issue #432: MADLIB-1351 : Added stopping 
criteria on perplexity to LDA
URL: https://github.com/apache/madlib/pull/432#issuecomment-548912324
 
 
   (6)  
   NULLs not being handled properly
   ```
   DROP TABLE IF EXISTS lda_model_perp, lda_output_data_perp;
   
   SELECT madlib.lda_train( 'documents_tf',          -- documents table in the 
form of term frequency
                            'lda_model_perp',        -- model table created by 
LDA training (not human readable)
                            'lda_output_data_perp',  -- readable output data 
table 
                            384,                     -- vocabulary size
                            5,                        -- number of topics
                            20,                      -- number of iterations
                            5,                       -- Dirichlet prior for the 
per-doc topic multinomial (alpha)
                            0.01,                    -- Dirichlet prior for the 
per-topic word multinomial (beta)
                            NULL,                    -- Evaluate perplexity 
every n iterations
                            NULL                     -- Stopping perplexity 
tolerance
                          );
   
   InternalError: (psycopg2.InternalError) plpy.Error: invalid argument: 
perplexity_tol should not be less than 0 (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "lda_train", line 22, in <module>
       voc_size, topic_num, iter_num, alpha, beta,evaluate_every , 
perplexity_tol)
     PL/Python function "lda_train", line 525, in lda_train
     PL/Python function "lda_train", line 96, in _assert
   PL/Python function "lda_train"
    [SQL: "SELECT madlib.lda_train( 'documents_tf',          -- documents table 
in the form of term frequency\n                         'lda_model_perp',       
 -- model table created by LDA training (not human readable)\n                  
       'lda_output_data_perp',  -- readable output data table \n                
         384,                     -- vocabulary size\n                         
5,                        -- number of topics\n                         20,     
                 -- number of iterations\n                         5,           
            -- Dirichlet prior for the per-doc topic multinomial (alpha)\n      
                   0.01,                    -- Dirichlet prior for the 
per-topic word multinomial (beta)\n                         NULL,               
        -- Evaluate perplexity every n iterations\n                         
NULL                      -- Stopping perplexity tolerance\n                    
   );"]
   ```
   
   Please implement as per
   ```
   evaluate_every (optional)
   INTEGER, default: 0. How often to evaluate perplexity. Set it to 0 or a 
negative number to not evaluate perplexity in training at all. Evaluating 
perplexity can help you check convergence during the training process, but it 
will also increase total training time. For example, evaluating perplexity in 
every iteration might increase training time up to two-fold.
   perplexity_tol (optional)
   DOUBLE PRECISION, default: 0.1. Perplexity tolerance to stop iteration. Only 
used when the parameter 'evaluate_every' is greater than 0.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [madlib] fmcquillan99 edited a comment on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA

Reply via email to