Re: Process UnStructured Data in Mahout for Clustering

2014-12-05 Thread Ted Dunning
On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh shaikhshah...@gmail.com
wrote:

 i see the problem is with the way data is written


What exactly do you mean by this?


Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hi All,
   I have been trying mahout clustering  on unstructured data i.e human
written data . I have tried mahout clustering algorithms like
Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .

i see the problem is with the way data is written , Can some one please
provide me some pointers on how to proceed with unstructured data  for
clustering.


i have written and analyzer that uses lower-Case and stop-words filter also
.

thanks :)


Regards,
Shaikh Shahid G .
+91 9503954781


Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Donni Khan
Hi
it depends on the nature of data you are clustering. If you have knowledge
about your data, you can figure out the results and you can also set the
correct parameters to the clustering algorithm like number of topics or
number of clusters.

Cheers,
Donni

On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh shaikhshah...@gmail.com
wrote:

 Hi All,
I have been trying mahout clustering  on unstructured data i.e human
 written data . I have tried mahout clustering algorithms like
 Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .

 i see the problem is with the way data is written , Can some one please
 provide me some pointers on how to proceed with unstructured data  for
 clustering.


 i have written and analyzer that uses lower-Case and stop-words filter also
 .

 thanks :)


 Regards,
 Shaikh Shahid G .
 +91 9503954781



Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hey Donni thanks but I have used the configurations and obtained the
clusters .the results are not promising enough . I was looking if there are
any known technics I can follow specifically while generating vectors .

Thanks

On Thursday, December 4, 2014, Donni Khan prince.don...@googlemail.com
wrote:
 Hi
 it depends on the nature of data you are clustering. If you have knowledge
 about your data, you can figure out the results and you can also set the
 correct parameters to the clustering algorithm like number of topics or
 number of clusters.

 Cheers,
 Donni

 On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh shaikhshah...@gmail.com
 wrote:

 Hi All,
I have been trying mahout clustering  on unstructured data i.e human
 written data . I have tried mahout clustering algorithms like
 Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .

 i see the problem is with the way data is written , Can some one please
 provide me some pointers on how to proceed with unstructured data  for
 clustering.


 i have written and analyzer that uses lower-Case and stop-words filter
also
 .

 thanks :)


 Regards,
 Shaikh Shahid G .
 +91 9503954781



-- 
Regards,
Shaikh Shahid G .
+91 9503954781


Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Brian Dolan
My experience has been that it's best to leave the data processing for Python.  
I strongly suggest you re-write your ETL and let Mahout only do the clustering. 
The built-in vectorization routines are fairly primitive.

Then I would wash the features, basically set up your own list of stop words or 
phrases, before you let Mahout do anything.

On Dec 4, 2014, at 8:38 AM, Shahid Shaikh shaikhshah...@gmail.com wrote:

 Hey Donni thanks but I have used the configurations and obtained the
 clusters .the results are not promising enough . I was looking if there are
 any known technics I can follow specifically while generating vectors .
 
 Thanks
 
 On Thursday, December 4, 2014, Donni Khan prince.don...@googlemail.com
 wrote:
 Hi
 it depends on the nature of data you are clustering. If you have knowledge
 about your data, you can figure out the results and you can also set the
 correct parameters to the clustering algorithm like number of topics or
 number of clusters.
 
 Cheers,
 Donni
 
 On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh shaikhshah...@gmail.com
 wrote:
 
 Hi All,
   I have been trying mahout clustering  on unstructured data i.e human
 written data . I have tried mahout clustering algorithms like
 Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .
 
 i see the problem is with the way data is written , Can some one please
 provide me some pointers on how to proceed with unstructured data  for
 clustering.
 
 
 i have written and analyzer that uses lower-Case and stop-words filter
 also
 .
 
 thanks :)
 
 
 Regards,
 Shaikh Shahid G .
 +91 9503954781
 
 
 
 -- 
 Regards,
 Shaikh Shahid G .
 +91 9503954781