Re: MAHOUT 0.9 Release - New URL

2014-01-21 Thread Suneel Marthi
Thanks Andrew M., see that some of the example scripts need to be fixed as they 
still refer to the deprecated algorithms.
See that the Streaming KMeans has failed for you as well.

I'll be rolling back the release today to fix these issues.  





On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman 
andrew.mussel...@gmail.com wrote:
 
Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
Linux AMI from tarball.

All tests pass.

*Output of examples:*
*asf-email-examples.sh, run on mahout.apache.org
http://mahout.apache.org:*
*recommendations:*
[ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
/user/ec2-user/asf-output/prefs/recommendations/part-r-0  | less
1
[21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
4
[14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
6
[5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
8   
    [12758:1.0,19409:1.0,2:1.0]
11
[25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
14
[29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
15
[15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
16
[23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
18
[29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
20
[19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
[snip]

*clustering; kmeans:*
[snip]
        Weight : [props - optional]:  Point:
        1.0 :
 [distance-squared=1.0193102046188427]:
/commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
[1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
39789:0.110, 40743:0.190, 45775:0.086]
        1.0 : [distance-squared=0.9823018320457279]:
/commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
[1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
        1.0 : [distance-squared=0.9509142993214911]:
/commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
[648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
 4419:0.076,
4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
41280:0.065, 41696:0.072, 41947:0.118,
 43685:0.086, 44077:0.308,
44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
[snip]

*clustering; dirichlet:*
Get this complaint:
Running Dirichlet with K = 8
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
classpath, will use command-line arguments only
Unknown program 'dirichlet' chosen.

*clustering: minhash:*
Running Minhash
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:17:27 WARN
 driver.MahoutDriver: Unable to add class: minhash

Re: MAHOUT 0.9 Release - New URL

2014-01-21 Thread Andrew Musselman
Sure thing; continuing to smoke test the other examples tonight


On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:

 Thanks Andrew M., see that some of the example scripts need to be fixed as
 they still refer to the deprecated algorithms.
 See that the Streaming KMeans has failed for you as well.

 I'll be rolling back the release today to fix these issues.





 On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman 
 andrew.mussel...@gmail.com wrote:

 Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
 Linux AMI from tarball.

 All tests pass.

 *Output of examples:*
 *asf-email-examples.sh, run on mahout.apache.org
 http://mahout.apache.org:*
 *recommendations:*
 [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
 /user/ec2-user/asf-output/prefs/recommendations/part-r-0  | less
 1

 [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
 4

 [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
 6

 [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
 8
 [12758:1.0,19409:1.0,2:1.0]
 11

 [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
 14

 [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
 15

 [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
 16

 [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
 18

 [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
 19  [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
 20

 [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
 [snip]

 *clustering; kmeans:*
 [snip]
 Weight : [props - optional]:  Point:
 1.0 :
  [distance-squared=1.0193102046188427]:
 /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
 [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
 39789:0.110, 40743:0.190, 45775:0.086]
 1.0 : [distance-squared=0.9823018320457279]:
 /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
 [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
 1.0 : [distance-squared=0.9509142993214911]:
 /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
 [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
  4419:0.076,
 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
 41280:0.065, 41696:0.072, 41947:0.118,
  43685:0.086, 44077:0.308,
 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
 [snip]

 *clustering; dirichlet:*
 Get this complaint:
 Running Dirichlet with K = 8
 Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
 HADOOP_CONF_DIR=
 MAHOUT-JOB:

 /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
 classpath, will use command-line arguments only
 Unknown program 'dirichlet' chosen.

 *clustering: minhash:*
 Running Minhash
 Running on 

Build failed in Jenkins: Mahout-Examples-Classify-20News #401

2014-01-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Classify-20News/401/changes

Changes:

[smarthi] Reverting back to 0.9-SNAPSHOT

--
[...truncated 3525 lines...]
777 494
778 494
779 493
780 492
781 492
782 491
783 491
784 490
785 490
786 489
787 489
788 488
789 488
790 488
791 487
792 487
793 487
794 487
795 487
796 486
797 485
798 485
799 484
800 481
801 481
802 480
803 477
804 477
805 477
806 477
807 477
808 476
809 475
810 475
811 474
812 474
813 474
814 473
815 473
816 473
817 472
818 472
819 471
820 471
821 471
822 470
823 470
824 470
825 470
826 469
827 469
828 468
829 468
830 468
831 466
832 466
833 466
834 465
835 465
836 464
837 464
838 464
839 463
840 463
841 462
842 461
843 461
844 461
845 461
846 461
847 461
848 460
849 460
850 459
851 458
852 458
853 458
854 456
855 455
856 455
857 455
858 454
859 454
860 453
861 453
862 452
863 452
864 452
865 451
866 451
867 451
868 451
869 450
870 450
871 449
872 448
873 448
874 448
875 448
876 447
877 447
878 447
879 446
880 446
881 445
882 445
883 444
884 443
885 443
886 443
887 442
888 442
889 441
890 441
891 441
892 440
893 439
894 438
895 437
896 437
897 437
898 436
899 436
900 436
901 435
902 435
903 434
904 434
905 433
906 433
907 433
908 432
909 432
910 431
911 431
912 431
913 431
914 431
915 430
916 430
917 430
918 428
919 428
920 426
921 425
922 425
923 425
924 425
925 425
926 425
927 424
928 423
929 421
930 421
931 421
932 421
933 418
934 418
935 417
936 416
937 416
938 416
939 415
940 415
941 415
942 415
943 414
944 414
945 414
946 413
947 413
948 412
949 411
950 410
951 410
952 410
953 409
954 408
955 407
956 406
957 406
958 405
959 405
960 405
961 405
962 404
963 404
964 404
965 404
966 404
967 404
968 404
969 404
970 403
971 402
972 402
973 401
974 401
975 401
976 400
977 400
978 399
979 399
980 398
981 397
982 397
983 397
984 396
985 396
986 395
987 395
988 395
989 394
990 394
991 394
992 394
993 393
994 393
995 393
996 393
997 393
998 393
999 393
1000392
Jan 21, 2014 8:16:46 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 398642 ms (Minutes: 6.6440334)
Testing on /tmp/mahout-work-jenkins/20news-bydate/20news-bydate-test/ with 
model: /tmp/news-group.model
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:https://builds.apache.org/job/Mahout-Examples-Classify-20News/ws/trunk/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:https://builds.apache.org/job/Mahout-Examples-Classify-20News/ws/trunk/examples/target/dependency/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
Jan 21, 2014 8:16:46 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.mahout.classifier.sgd.TestNewsGroups.props found on 
classpath, will use command-line arguments only
1 test files
Exception in thread main java.lang.IndexOutOfBoundsException: Index: 12, 
Size: 1
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at 
org.apache.mahout.classifier.sgd.TestNewsGroups.run(TestNewsGroups.java:95)
at 
org.apache.mahout.classifier.sgd.TestNewsGroups.main(TestNewsGroups.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
Build step 'Execute shell' marked build as failure


RE: MAHOUT 0.9 Release - New URL

2014-01-21 Thread Andrew Palumbo
from the asf-email-examples.sh script:

# You will need to download or otherwise obtain some or all of the Amazon ASF Em
ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to use this
 script.
# To obtain a full copy you will need to launch an EC2 instance and mount the da
taset to download it, otherwise you can get a sample of it at
# http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

It looks like the:
 http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

link is down.  

Is there somewhere else that we can get a subset of the ASF emails?



Date: Tue, 21 Jan 2014 09:48:06 -0800
 Subject: Re: MAHOUT 0.9 Release - New URL
 From: andrew.mussel...@gmail.com
 To: dev@mahout.apache.org
 
 Sure thing; continuing to smoke test the other examples tonight
 
 
 On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:
 
  Thanks Andrew M., see that some of the example scripts need to be fixed as
  they still refer to the deprecated algorithms.
  See that the Streaming KMeans has failed for you as well.
 
  I'll be rolling back the release today to fix these issues.
 
 
 
 
 
  On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman 
  andrew.mussel...@gmail.com wrote:
 
  Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
  Linux AMI from tarball.
 
  All tests pass.
 
  *Output of examples:*
  *asf-email-examples.sh, run on mahout.apache.org
  http://mahout.apache.org:*
  *recommendations:*
  [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
  /user/ec2-user/asf-output/prefs/recommendations/part-r-0  | less
  1
 
  [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
  4
 
  [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
  6
 
  [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
  8
  [12758:1.0,19409:1.0,2:1.0]
  11
 
  [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
  14
 
  [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
  15
 
  [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
  16
 
  [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
  18
 
  [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
  19  [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
  20
 
  [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
  [snip]
 
  *clustering; kmeans:*
  [snip]
  Weight : [props - optional]:  Point:
  1.0 :
   [distance-squared=1.0193102046188427]:
  /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
  [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
  7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
  10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
  19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
  25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
  31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
  39789:0.110, 40743:0.190, 45775:0.086]
  1.0 : [distance-squared=0.9823018320457279]:
  /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
  [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
  6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
  10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
  19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
  25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
  30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
  36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
  1.0 : [distance-squared=0.9509142993214911]:
  /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
  [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
   4419:0.076,
  4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
  7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
  7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
  10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
  12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
  14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
  20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
  23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
  29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
  31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
  33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
  36189:0.078, 

[jira] [Updated] (MAHOUT-1398) FileDataModel should provide a constructor with a delimiterPattern

2014-01-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1398:
--

Affects Version/s: (was: 0.9)
   0.8
Fix Version/s: (was: 1.0)
   0.9
 Assignee: Sebastian Schelter

Moving this to 0.9

 FileDataModel should provide a constructor with a delimiterPattern
 --

 Key: MAHOUT-1398
 URL: https://issues.apache.org/jira/browse/MAHOUT-1398
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
Affects Versions: 0.8
Reporter: Roy Guo
Assignee: Sebastian Schelter
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1398.patch


 For now we only have ',' and '\t' as delimiters, this is really not enough 
 for users.
 Of course users can overwritten processLine etc. to archive their goal(e.g. 
 use four spaces as delimiter pattern), but as a well designed framework, 
 Mahout should consider vary demands of most users and make it very easy to 
 use.
 Also, it will not cost much time to implement, can I push a patch on this ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1400:
-

 Summary: Remove references to deprecated and removed algorithms 
from examples scripts
 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 0.9


Still see references to old clustering algorithms like Minhash, Dirichlet in 
asf-email-examples.sh and cluster-syntheticcontrol.sh.

Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1400:
-

Assignee: Sebastian Schelter  (was: Suneel Marthi)

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877844#comment-13877844
 ] 

Sebastian Schelter commented on MAHOUT-1400:


I will remove the asf-email-examples script for this release, as (1) it still 
has references to the deprecated algorithms , (2) the sample data is not 
available under the given address.

We should definitely fix and reintroduce it for the next release

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter resolved MAHOUT-1400.


Resolution: Fixed

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: MAHOUT 0.9 Release - New URL

2014-01-21 Thread Suneel Marthi
Thanks Andrew for reporting that. I rolled back the release to fix this and few 
other issues.

We have removed asf-examples*.sh from trunk as the sample file at the url 
mentioned in ur email is not available.
This is something we need to fix and restore in 1.0.







On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo ap@outlook.com wrote:
 
from the asf-email-examples.sh script:

# You will need to download or otherwise obtain some or all of the Amazon ASF Em
ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to use this
script.
# To obtain a full copy you will need to launch an EC2 instance and mount the da
taset to download it, otherwise you can get a sample of it at
# http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

It looks like the:
http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

link is down.  

Is there somewhere else that we can get a subset of the ASF emails?



Date: Tue, 21 Jan 2014 09:48:06 -0800
 Subject: Re: MAHOUT 0.9 Release - New URL
 From: andrew.mussel...@gmail.com
 To: dev@mahout.apache.org
 
 Sure thing; continuing to smoke test the other examples tonight
 
 
 On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:
 
  Thanks Andrew M., see that some of the example scripts need to be fixed as
  they still refer to the deprecated algorithms.
  See that the Streaming KMeans has failed for you as well.
 
  I'll be rolling back the release today to fix these issues.
 
 
 
 
 
  On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman 
  andrew.mussel...@gmail.com wrote:
 
  Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
  Linux AMI from tarball.
 
  All tests pass.
 
  *Output of examples:*
  *asf-email-examples.sh, run on mahout.apache.org
  http://mahout.apache.org:*
  *recommendations:*
  [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
  /user/ec2-user/asf-output/prefs/recommendations/part-r-0  | less
  1
 
  [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
  4
 
  [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
  6
 
  [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
  8
      [12758:1.0,19409:1.0,2:1.0]
  11
 
  [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
  14
 
  [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
  15
 
  [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
  16
 
  [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
  18
 
  [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
  19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
  20
 
  [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
  [snip]
 
  *clustering; kmeans:*
  [snip]
          Weight : [props - optional]:  Point:
          1.0 :
   [distance-squared=1.0193102046188427]:
  /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
  [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
  7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
  10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
  19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
  25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
  31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
  39789:0.110, 40743:0.190, 45775:0.086]
          1.0 : [distance-squared=0.9823018320457279]:
  /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
  [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
  6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
  10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
  19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
  25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
  30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
  36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
          1.0 : [distance-squared=0.9509142993214911]:
  /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
  [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
   4419:0.076,
  4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
  7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
  7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
  10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
  12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
  14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
  

Jenkins build is back to normal : Mahout-Examples-Classify-20News #402

2014-01-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Classify-20News/402/changes



[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877872#comment-13877872
 ] 

Hudson commented on MAHOUT-1400:


SUCCESS: Integrated in Mahout-Quality #2428 (See 
[https://builds.apache.org/job/Mahout-Quality/2428/])
MAHOUT-1400 Remove references to deprecated and removed algorithms from 
examples scripts (ssc: rev 1560185)
* /mahout/trunk/CHANGELOG
* /mahout/trunk/examples/bin/README.txt
MAHOUT-1400 Remove references to deprecated and removed algorithms from 
examples scripts (ssc: rev 1560178)
* /mahout/trunk/examples/bin/asf-email-examples.sh
* /mahout/trunk/examples/bin/build-asf-email.sh
* /mahout/trunk/examples/bin/build-cluster-syntheticcontrol.sh
* /mahout/trunk/examples/bin/cluster-syntheticcontrol.sh


 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1398) FileDataModel should provide a constructor with a delimiterPattern

2014-01-21 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter resolved MAHOUT-1398.


Resolution: Fixed

 FileDataModel should provide a constructor with a delimiterPattern
 --

 Key: MAHOUT-1398
 URL: https://issues.apache.org/jira/browse/MAHOUT-1398
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
Affects Versions: 0.8
Reporter: Roy Guo
Assignee: Sebastian Schelter
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1398.patch


 For now we only have ',' and '\t' as delimiters, this is really not enough 
 for users.
 Of course users can overwritten processLine etc. to archive their goal(e.g. 
 use four spaces as delimiter pattern), but as a well designed framework, 
 Mahout should consider vary demands of most users and make it very easy to 
 use.
 Also, it will not cost much time to implement, can I push a patch on this ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: MAHOUT 0.9 Release - New URL

2014-01-21 Thread Andrew Musselman
*classify-20newsgroups.sh*

*Complementary naive bayes:*
===
Summary
---
Correctly Classified Instances  :  11207   98.9406%
Incorrectly Classified Instances:1201.0594%
Total Classified Instances  :  11327

===
Confusion Matrix
---
a   b   c   d   e   f   g   h   i   j
k   l   m   n   o   p   q   r   s
 t--Classified as
475 0   0   1   0   0   0   0   0   0
0   0   0   0   1   0   1   0   0
 0 |  478 a = alt.atheism
0   597 1   1   0   1   1   0   0   0
0   1   0   2   1   0   0   0   0
 0 |  605 b = comp.graphics
0   1   620 3   0   1   0   0   0   0
0   1   0   0   1   0   0   0   0
 0 |  627 c = comp.os.ms-windows.misc
1   1   1   593 2   0   0   0   0   0
0   0   0   0   0   1   0   0   0
 0 |  599 d = comp.sys.ibm.pc.hardware
0   1   1   0   568 0   1   0   0   0
1   1   2   0   0   0   0   1   0
 0 |  576 e = comp.sys.mac.hardware
0   4   2   0   0   581 0   0   0   0
0   0   0   0   0   0   0   0   0
 0 |  587 f = comp.windows.x
0   0   0   1   2   0   571 3   0   0
1   1   4   1   0   0   0   0   0
 0 |  584 g = misc.forsale
0   0   0   1   0   0   0   589 1   0
0   1   1   0   0   0   0   0   0
 0 |  593 h = rec.autos
0   0   0   0   0   0   0   1   565 0
0   0   0   0   1   0   0   0   0
 0 |  567 i = rec.motorcycles
0   0   0   0   0   0   0   0   0   600
2   0   0   0   1   0   0   0   0
 0 |  603 j = rec.sport.baseball
0   0   0   0   0   0   0   0   0   1
584 0   0   0   0   0   0   0   0
 0 |  585 k = rec.sport.hockey
0   0   0   0   0   0   0   0   0   0
0   579 0   0   0   0   0   1   0
 0 |  580 l = sci.crypt
0   0   0   1   3   0   2   0   0   2
0   0   567 1   2   1   0   0   0
 0 |  579 m = sci.electronics
0   0   0   0   0   0   0   0   0   0
0   0   1   605 0   0   0   0   0
 0 |  606 n = sci.med
0   0   0   0   0   0   0   0   0   0
0   0   0   0   602 0   0   0   0
 0 |  602 o = sci.space
0   0   0   0   0   0   0   0   0   0
0   0   0   1   0   602 0   0   1
 0 |  604 p = soc.religion.christian
0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   556 0   0
 0 |  556 q = talk.politics.mideast
0   0   1   0   0   0   0   0   0   0
0   1   0   0   1   0   0   568 0
 0 |  571 r = talk.politics.guns
11  0   0   0   0   0   0   0   0   1
0   0   0   1   3   8   1   4   338
 2 |  369 s = talk.religion.misc
0   0   0   0   0   0   0   0   0   0
1   0   0   0   1   0   3   4   0
 447   |  456 t = talk.politics.misc

===
Statistics
---
Kappa   0.9806
Accuracy   98.9406%
Reliability94.0932%
Reliability (standard deviation)0.2163

Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 15870 ms (Minutes: 

[jira] [Commented] (MAHOUT-1398) FileDataModel should provide a constructor with a delimiterPattern

2014-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877924#comment-13877924
 ] 

Hudson commented on MAHOUT-1398:


SUCCESS: Integrated in Mahout-Quality #2429 (See 
[https://builds.apache.org/job/Mahout-Quality/2429/])
MAHOUT-1398 FileDataModel should provide a constructor with a delimiterPattern 
(ssc: rev 1560202)
* /mahout/trunk/CHANGELOG
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/model/file/FileDataModel.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/cf/taste/impl/model/file/FileDataModelTest.java


 FileDataModel should provide a constructor with a delimiterPattern
 --

 Key: MAHOUT-1398
 URL: https://issues.apache.org/jira/browse/MAHOUT-1398
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
Affects Versions: 0.8
Reporter: Roy Guo
Assignee: Sebastian Schelter
Priority: Minor
 Fix For: 0.9

 Attachments: MAHOUT-1398.patch


 For now we only have ',' and '\t' as delimiters, this is really not enough 
 for users.
 Of course users can overwritten processLine etc. to archive their goal(e.g. 
 use four spaces as delimiter pattern), but as a well designed framework, 
 Mahout should consider vary demands of most users and make it very easy to 
 use.
 Also, it will not cost much time to implement, can I push a patch on this ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1401 started by Suneel Marthi.

 Resurrect Frequent Pattern mining
 -

 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1401:
-

 Summary: Resurrect Frequent Pattern mining
 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh

2014-01-21 Thread Andrew Musselman (JIRA)
Andrew Musselman created MAHOUT-1402:


 Summary: Zero clusters using streaming k-means option in 
cluster-reuters.sh
 Key: MAHOUT-1402
 URL: https://issues.apache.org/jira/browse/MAHOUT-1402
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.8
 Environment: AWS default Linux AMI
Reporter: Andrew Musselman
 Fix For: 0.9


Running cluster-reuters.sh in examples/bin results in this:

[snip]
INFO: Number of Centroids: 0
Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local23982482_0001
java.lang.IllegalArgumentException: Must have nonzero number of training and 
test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0]
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
at 
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
at 
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

[snip]

WARNING: No qualcluster.props found on classpath, will use command-line 
arguments only
Num clusters: 0; maxDistance: 0.00
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 535 ms (Minutes: 0.008916)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Yoonmin Nam (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878264#comment-13878264
 ] 

Yoonmin Nam commented on MAHOUT-1401:
-

I believe we must focus on the availability of FPM algorithm during the 
resurrection.



 Resurrect Frequent Pattern mining
 -

 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878269#comment-13878269
 ] 

Suneel Marthi commented on MAHOUT-1401:
---

Sorry, could u explain that - 'availability of FPM algorithm' ? Not sure I get 
what u r trying to convey.

 Resurrect Frequent Pattern mining
 -

 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1296) Remove deprecated algorithms

2014-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878274#comment-13878274
 ] 

Hudson commented on MAHOUT-1296:


FAILURE: Integrated in Mahout-Quality #2430 (See 
[https://builds.apache.org/job/Mahout-Quality/2430/])
MAHOUT-1296: /examples/src/main/java/org/apache/mahout/fpm directory should 
have been deleted as part of this jira. (smarthi: rev 1560250)
* /mahout/trunk/examples/src/main/java/org/apache/mahout/fpm


 Remove deprecated algorithms
 

 Key: MAHOUT-1296
 URL: https://issues.apache.org/jira/browse/MAHOUT-1296
 Project: Mahout
  Issue Type: Improvement
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
 Fix For: 0.9


 Remove the algorithms we chose to deprecate in MAHOUT-1250



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Build failed in Jenkins: Mahout-Quality #2430

2014-01-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2430/changes

Changes:

[smarthi] MAHOUT-1296: /examples/src/main/java/org/apache/mahout/fpm directory 
should have been deleted as part of this jira.

--
[...truncated 4251 lines...]
Running org.apache.mahout.math.neighborhood.SearchQualityTest
Running org.apache.mahout.math.neighborhood.SearchSanityTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.919 sec - 
in org.apache.mahout.math.hadoop.TestDistributedRowMatrix
Running org.apache.mahout.math.ssvd.SequentialOutOfCoreSvdTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.993 sec - in 
org.apache.mahout.math.hadoop.solver.TestDistributedConjugateGradientSolverCLI
Running org.apache.mahout.math.stats.OnlineAucTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.64 sec - in 
org.apache.mahout.math.hadoop.solver.TestDistributedConjugateGradientSolver
Running org.apache.mahout.math.stats.SamplerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.372 sec - in 
org.apache.mahout.math.stats.SamplerTest
Running org.apache.mahout.math.VectorWritableTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.488 sec - in 
org.apache.mahout.math.stats.OnlineAucTest
Running org.apache.mahout.cf.taste.common.CommonTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.566 sec - in 
org.apache.mahout.cf.taste.common.CommonTest
Tests run: 100, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.718 sec - 
in org.apache.mahout.math.VectorWritableTest
Running org.apache.mahout.cf.taste.hadoop.TopItemsQueueTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.136 sec - in 
org.apache.mahout.classifier.sgd.OnlineLogisticRegressionTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.33 sec - in 
org.apache.mahout.cf.taste.hadoop.TopItemsQueueTest
Running org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducerTest
Running org.apache.mahout.cf.taste.hadoop.item.RecommenderJobTest
Running org.apache.mahout.cf.taste.hadoop.TasteHadoopUtilsTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.034 sec - in 
org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDSolverSparseSequentialTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.632 sec - in 
org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.336 sec - in 
org.apache.mahout.cf.taste.hadoop.TasteHadoopUtilsTest
Running org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJobTest
Running org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJobTest
Running org.apache.mahout.cf.taste.impl.common.RefreshHelperTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.632 sec - in 
org.apache.mahout.cf.taste.impl.common.RefreshHelperTest
Running org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.526 sec - in 
org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest
Running org.apache.mahout.cf.taste.impl.common.CacheTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.285 sec - in 
org.apache.mahout.cf.taste.impl.common.CacheTest
Running org.apache.mahout.cf.taste.impl.common.RunningAverageTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.328 sec - in 
org.apache.mahout.cf.taste.impl.common.RunningAverageTest
Running org.apache.mahout.cf.taste.impl.common.FastMapTest
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.175 sec - in 
org.apache.mahout.cf.taste.impl.common.FastMapTest
Running org.apache.mahout.cf.taste.impl.common.BitSetTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.303 sec - in 
org.apache.mahout.cf.taste.impl.common.BitSetTest
Running org.apache.mahout.cf.taste.impl.common.FastByIDMapTest
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.079 sec - in 
org.apache.mahout.cf.taste.impl.common.FastByIDMapTest
Running org.apache.mahout.cf.taste.impl.common.LongPrimitiveArrayIteratorTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.331 sec - in 
org.apache.mahout.cf.taste.impl.common.LongPrimitiveArrayIteratorTest
Running org.apache.mahout.cf.taste.impl.common.WeightedRunningAverageTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.983 sec - in 
org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.292 sec - in 
org.apache.mahout.cf.taste.impl.common.WeightedRunningAverageTest
Running org.apache.mahout.cf.taste.impl.common.RunningAverageAndStdDevTest
Running org.apache.mahout.cf.taste.impl.common.FastIDSetTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.434 sec - in 

[jira] [Commented] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Yoonmin Nam (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878282#comment-13878282
 ] 

Yoonmin Nam commented on MAHOUT-1401:
-

I mean that the very first, but crucial problem of current implementation of 
FPM algorithm is the explosion of intermediate data when generating frequent 
patterns.
Above explosion makes FPM not available in many cases even we use very small 
input.
That problem comes from the shortage of intermediate buffer of MapReduce, but 
as we consider the FPM in the algorithm-level, so we should find the another 
alternatives either avoid or handle that problem I mentioned. 

 Resurrect Frequent Pattern mining
 -

 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Yoonmin Nam (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878282#comment-13878282
 ] 

Yoonmin Nam edited comment on MAHOUT-1401 at 1/22/14 5:34 AM:
--

I mean that the very first, but crucial problem of current implementation of 
FPM algorithm is the explosion of intermediate data when generating frequent 
patterns.
Above explosion makes FPM not available in many cases even we use very small 
input.
That problem comes from the shortage of intermediate buffer of MapReduce, but 
as we consider the FPM in the algorithm-level, so we should find the another 
alternatives either to avoid or handle that problem I mentioned. 


was (Author: ronymin):
I mean that the very first, but crucial problem of current implementation of 
FPM algorithm is the explosion of intermediate data when generating frequent 
patterns.
Above explosion makes FPM not available in many cases even we use very small 
input.
That problem comes from the shortage of intermediate buffer of MapReduce, but 
as we consider the FPM in the algorithm-level, so we should find the another 
alternatives either avoid or handle that problem I mentioned. 

 Resurrect Frequent Pattern mining
 -

 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878287#comment-13878287
 ] 

Suneel Marthi commented on MAHOUT-1401:
---

We don't have the time in 0.9 Release to fix the implementation, that's 
definitely something that needs to be fixed in future releases.  Let me go 
ahead and resurrect the deleted code for now.

 Resurrect Frequent Pattern mining
 -

 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878293#comment-13878293
 ] 

Suneel Marthi commented on MAHOUT-1400:
---

factorize-netflix.sh: References a data set that is no longer available and 
Netflix took down after the competition. Should we retire this script too?

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1401.
---

   Resolution: Fixed
Fix Version/s: 0.9

Code committed back into trunk.

 Resurrect Frequent Pattern mining
 -

 Key: MAHOUT-1401
 URL: https://issues.apache.org/jira/browse/MAHOUT-1401
 Project: Mahout
  Issue Type: Bug
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Critical
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh

2014-01-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1402:
-

Assignee: Suneel Marthi

 Zero clusters using streaming k-means option in cluster-reuters.sh
 --

 Key: MAHOUT-1402
 URL: https://issues.apache.org/jira/browse/MAHOUT-1402
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.8
 Environment: AWS default Linux AMI
Reporter: Andrew Musselman
Assignee: Suneel Marthi
 Fix For: 0.9


 Running cluster-reuters.sh in examples/bin results in this:
 [snip]
 INFO: Number of Centroids: 0
 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
 WARNING: job_local23982482_0001
 java.lang.IllegalArgumentException: Must have nonzero number of training and 
 test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0]
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
 at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 [snip]
 WARNING: No qualcluster.props found on classpath, will use command-line 
 arguments only
 Num clusters: 0; maxDistance: 0.00
 [Dunn Index] First: Infinity
 [Davies-Bouldin Index] First: NaN
 Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
 INFO: Program took 535 ms (Minutes: 0.008916)
 cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1402) Zero clusters using streaming k-means option in cluster-reuters.sh

2014-01-21 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878304#comment-13878304
 ] 

Suneel Marthi commented on MAHOUT-1402:
---

The MR version of Streaming KMeans seems to be failing (the sequential mode 
passes), the reason being that the reducer is reading zero centroids from the 
mappers; need to investigate as to what's going on.

 Zero clusters using streaming k-means option in cluster-reuters.sh
 --

 Key: MAHOUT-1402
 URL: https://issues.apache.org/jira/browse/MAHOUT-1402
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.8
 Environment: AWS default Linux AMI
Reporter: Andrew Musselman
Assignee: Suneel Marthi
 Fix For: 0.9


 Running cluster-reuters.sh in examples/bin results in this:
 [snip]
 INFO: Number of Centroids: 0
 Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
 WARNING: job_local23982482_0001
 java.lang.IllegalArgumentException: Must have nonzero number of training and 
 test vectors. Asked for %.1f %% of %d vectors for test [10.00149011612, 0]
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
 at 
 org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
 at 
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
 at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 [snip]
 WARNING: No qualcluster.props found on classpath, will use command-line 
 arguments only
 Num clusters: 0; maxDistance: 0.00
 [Dunn Index] First: Infinity
 [Davies-Bouldin Index] First: NaN
 Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
 INFO: Program took 535 ms (Minutes: 0.008916)
 cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1401) Resurrect Frequent Pattern mining

2014-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878330#comment-13878330
 ] 

Hudson commented on MAHOUT-1401:


SUCCESS: Integrated in Mahout-Quality #2431 (See 
[https://builds.apache.org/job/Mahout-Quality/2431/])
MAHOUT-1401: Resurrecting Frequent Pattern Mining (smarthi: rev 1560259)
* /mahout/trunk/CHANGELOG
* /mahout/trunk/core/src/main/java/org/apache/mahout/fpm
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/AggregatorMapper.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/AggregatorReducer.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/CountDescendingPairComparator.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthDriver.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/MultiTransactionTreeIterator.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowth.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/ParallelCountingMapper.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/ParallelCountingReducer.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/ParallelFPGrowthCombiner.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/ParallelFPGrowthMapper.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/ParallelFPGrowthReducer.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/TransactionTree.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/TransactionTreeIterator.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/ContextStatusUpdater.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/ContextWriteOutputCollector.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/SequenceFileOutputCollector.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/StatusUpdater.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/TopKPatternsOutputConverter.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/TransactionIterator.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/integer/IntegerStringOutputConverter.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/string/StringOutputConverter.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/convertors/string/TopKStringPatterns.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/FPGrowth.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/FPTree.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/FPTreeDepthCache.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/FrequentPatternMaxHeap.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/LeastKCache.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/Pattern.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth2/FPGrowthIds.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth2/FPGrowthObj.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth2/FPTree.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/fpm
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthRetailDataTest.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthRetailDataTest2.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthRetailDataTestVs.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthSyntheticDataTest.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthTest.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/FPGrowthTest2.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowthRetailDataTest.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowthRetailDataTest2.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowthRetailDataTestVs.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowthSynthDataTest2.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowthTest.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/PFPGrowthTest2.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/TransactionTreeTest.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/fpm/pfpgrowth/fpgrowth/FrequentPatternMaxHeapTest.java
* 

Jenkins build is back to normal : Mahout-Quality #2431

2014-01-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2431/changes



[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878339#comment-13878339
 ] 

Sebastian Schelter commented on MAHOUT-1400:


We should keep the example, the Netflix dataset is still regularly used in 
research papers.

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878346#comment-13878346
 ] 

Suneel Marthi commented on MAHOUT-1400:
---

Do we have a copy of the dataset someplace, we could add a reference to that in 
the script?

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878349#comment-13878349
 ] 

Andrew Musselman commented on MAHOUT-1400:
--

The ASF email dataset is usable via the AWS volume; perhaps the Netflix set can 
live in a snapshot too.

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878364#comment-13878364
 ] 

Sean Owen commented on MAHOUT-1400:
---

We don't automatically have permission to redistribute any dataset, even if 
it's still distributed publicly. In this case, I'm sure Netflix won't or can't 
grant that permission anyway. I would not host this or any data set via Apache 
unless it's clearly public domain, licensed appropriately (AL2 or appropriate 
Creative Commons) or permission has been given explicitly.

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878366#comment-13878366
 ] 

Sebastian Schelter commented on MAHOUT-1400:


Completely agree to that.

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Issue Comment Deleted] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1400:
--

Comment: was deleted

(was: Do we have a copy of the dataset someplace, we could add a reference to 
that in the script?)

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1400) Remove references to deprecated and removed algorithms from examples scripts

2014-01-21 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878370#comment-13878370
 ] 

Suneel Marthi commented on MAHOUT-1400:
---

+1 to Sean's comment.

 Remove references to deprecated and removed algorithms from examples scripts
 

 Key: MAHOUT-1400
 URL: https://issues.apache.org/jira/browse/MAHOUT-1400
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Sebastian Schelter
 Fix For: 0.9


 Still see references to old clustering algorithms like Minhash, Dirichlet in 
 asf-email-examples.sh and cluster-syntheticcontrol.sh.
 Also remove build-asf-email.sh and build-cluster-syntheticcontrol.sh from 
 examples/bin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)