yep.

On Fri, Mar 5, 2010 at 7:55 PM, Claudio Martella <[email protected]
> wrote:

> Thanks!
>
> I'll try with (a) and maybe some Dirichlet Process Clustering. I notice
> that LDA needs also maxWords. In my understanding that's the length of
> the dictionary.txt (the number of unique words in my vectors) i got from
> lucene.vectors. Is that correct?
>
>
> Ted Dunning wrote:
> > This is a difficult topic that is addressed in different ways in
> practical
> > situations.  The approaches I know of include:
> >
> > a) just pick a number that is probably big enough and go forward.  20,
> 30,
> > 50 or 100 are all viable choices depending on the scale of your corpus.
> > Numbers as small as 5 might make sense for special purpose cases such as
> > voting histories.
> >
> > b) run a parameter sweep over the number of topics and look at posterior
> > likelihood of your corpus.   This is pretty commonly done.
> >
> > c) move to a more advanced non-parametric Bayesian approach where your
> > learning algorithms basically to (b) in a single learning process.  I
> > haven't heard of anyone doing this in applied situations yet, but it is a
> > very seductive goal.
> >
> > Only (a) and (b) are viable in Mahout's implementation of LDA.  Option
> (c)
> > is implemented in our Dirichlet Process clustering, but that is less
> > powerful in some ways than LDA.
> >
> > On Thu, Mar 4, 2010 at 6:56 AM, Claudio Martella <
> [email protected]
> >
> >> wrote:
> >>
> >
> >
> >> The documents span different topics and i don't know in advance
> >> (and would LOVE to avoid it) their number. Do you have any advice on a
> >> strategy to follow?
> >>
> >>
> >
> >
> >
> >
>
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> [email protected] http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13
> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we
> process your personal data in order to fulfil contractual and fiscal
> obligations and also to send you information regarding our services and
> events. Your personal data are processed with and without electronic means
> and by respecting data subjects' rights, fundamental freedoms and dignity,
> particularly with regard to confidentiality, personal identity and the right
> to personal data protection. At any time and without formalities you can
> write an e-mail to [email protected] in order to object the processing of
> your personal data for the purpose of sending advertising materials and also
> to exercise the right to access personal data and other rights referred to
> in Section 7 of Decree 196/2003. The data controller is TIS Techno
> Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the
> complete information on the web site www.tis.bz.it.
>
>
>

Reply via email to