Re: Process UnStructured Data in Mahout for Clustering
On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh wrote: > i see the problem is with the way data is written What exactly do you mean by this?
Re: Process UnStructured Data in Mahout for Clustering
My experience has been that it's best to leave the data processing for Python. I strongly suggest you re-write your ETL and let Mahout only do the clustering. The built-in vectorization routines are fairly primitive. Then I would wash the features, basically set up your own list of stop words or phrases, before you let Mahout do anything. On Dec 4, 2014, at 8:38 AM, Shahid Shaikh wrote: > Hey Donni thanks but I have used the configurations and obtained the > clusters .the results are not promising enough . I was looking if there are > any known technics I can follow specifically while generating vectors . > > Thanks > > On Thursday, December 4, 2014, Donni Khan > wrote: >> Hi >> it depends on the nature of data you are clustering. If you have knowledge >> about your data, you can figure out the results and you can also set the >> correct parameters to the clustering algorithm like number of topics or >> number of clusters. >> >> Cheers, >> Donni >> >> On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh >> wrote: >> >>> Hi All, >>> I have been trying mahout clustering on unstructured data i.e human >>> written data . I have tried mahout clustering algorithms like >>> Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . >>> >>> i see the problem is with the way data is written , Can some one please >>> provide me some pointers on how to proceed with unstructured data for >>> clustering. >>> >>> >>> i have written and analyzer that uses lower-Case and stop-words filter > also >>> . >>> >>> thanks :) >>> >>> >>> Regards, >>> Shaikh Shahid G . >>> +91 9503954781 >>> >> > > -- > Regards, > Shaikh Shahid G . > +91 9503954781
Re: Process UnStructured Data in Mahout for Clustering
Hey Donni thanks but I have used the configurations and obtained the clusters .the results are not promising enough . I was looking if there are any known technics I can follow specifically while generating vectors . Thanks On Thursday, December 4, 2014, Donni Khan wrote: > Hi > it depends on the nature of data you are clustering. If you have knowledge > about your data, you can figure out the results and you can also set the > correct parameters to the clustering algorithm like number of topics or > number of clusters. > > Cheers, > Donni > > On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh > wrote: > >> Hi All, >>I have been trying mahout clustering on unstructured data i.e human >> written data . I have tried mahout clustering algorithms like >> Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . >> >> i see the problem is with the way data is written , Can some one please >> provide me some pointers on how to proceed with unstructured data for >> clustering. >> >> >> i have written and analyzer that uses lower-Case and stop-words filter also >> . >> >> thanks :) >> >> >> Regards, >> Shaikh Shahid G . >> +91 9503954781 >> > -- Regards, Shaikh Shahid G . +91 9503954781
Re: Process UnStructured Data in Mahout for Clustering
Hi it depends on the nature of data you are clustering. If you have knowledge about your data, you can figure out the results and you can also set the correct parameters to the clustering algorithm like number of topics or number of clusters. Cheers, Donni On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh wrote: > Hi All, >I have been trying mahout clustering on unstructured data i.e human > written data . I have tried mahout clustering algorithms like > Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . > > i see the problem is with the way data is written , Can some one please > provide me some pointers on how to proceed with unstructured data for > clustering. > > > i have written and analyzer that uses lower-Case and stop-words filter also > . > > thanks :) > > > Regards, > Shaikh Shahid G . > +91 9503954781 >