yep.
On Fri, Mar 5, 2010 at 7:55 PM, Claudio Martella <[email protected] > wrote: > Thanks! > > I'll try with (a) and maybe some Dirichlet Process Clustering. I notice > that LDA needs also maxWords. In my understanding that's the length of > the dictionary.txt (the number of unique words in my vectors) i got from > lucene.vectors. Is that correct? > > > Ted Dunning wrote: > > This is a difficult topic that is addressed in different ways in > practical > > situations. The approaches I know of include: > > > > a) just pick a number that is probably big enough and go forward. 20, > 30, > > 50 or 100 are all viable choices depending on the scale of your corpus. > > Numbers as small as 5 might make sense for special purpose cases such as > > voting histories. > > > > b) run a parameter sweep over the number of topics and look at posterior > > likelihood of your corpus. This is pretty commonly done. > > > > c) move to a more advanced non-parametric Bayesian approach where your > > learning algorithms basically to (b) in a single learning process. I > > haven't heard of anyone doing this in applied situations yet, but it is a > > very seductive goal. > > > > Only (a) and (b) are viable in Mahout's implementation of LDA. Option > (c) > > is implemented in our Dirichlet Process clustering, but that is less > > powerful in some ways than LDA. > > > > On Thu, Mar 4, 2010 at 6:56 AM, Claudio Martella < > [email protected] > > > >> wrote: > >> > > > > > >> The documents span different topics and i don't know in advance > >> (and would LOVE to avoid it) their number. Do you have any advice on a > >> strategy to follow? > >> > >> > > > > > > > > > > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > [email protected] http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 > of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to [email protected] in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to > in Section 7 of Decree 196/2003. The data controller is TIS Techno > Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the > complete information on the web site www.tis.bz.it. > > >
