bobbai00 commented on code in PR #3715:
URL: https://github.com/apache/texera/pull/3715#discussion_r2395286324


##########
core/file-service/src/main/scala/edu/uci/ics/texera/service/resource/DatasetResource.scala:
##########
@@ -239,43 +239,70 @@ class DatasetResource {
 
     withTransaction(context) { ctx =>
       val uid = user.getUid
-      val datasetDao: DatasetDao = new DatasetDao(ctx.configuration())
       val datasetUserAccessDao: DatasetUserAccessDao = new 
DatasetUserAccessDao(ctx.configuration())
 
       val datasetName = request.datasetName
       val datasetDescription = request.datasetDescription
       val isDatasetPublic = request.isDatasetPublic
       val isDatasetDownloadable = request.isDatasetDownloadable
 
-      // Check if a dataset with the same name already exists
-      if (!datasetDao.fetchByName(datasetName).isEmpty) {
-        throw new BadRequestException("Dataset with the same name already 
exists")
-      }
-
-      // Initialize the repository in LakeFS
+      // validate dataset name
       try {
-        LakeFSStorageClient.initRepo(datasetName)
+        LakeFSStorageClient.validateRepositoryName(datasetName)

Review Comment:
   Semantically,
   - datasetName is user given
   - repositoryName is allocated by internal services.
   Therefore, although LakeFS has a good validation rule, it is misleading to 
call `LakeFSStorageClient.validateRepositoryName()` on the `datasetName`. 
   
   You can do the following here:
   - create a static function in `DatasetResource` named `validateDatasetName`. 
This function does the exact same thing as `validateRepositoryName`. 
   - call validateDatasetName here.
   This duplication is meaningful as it makes the semantic consistent. Also it 
provides the flexibility for us to modify the validation rule later, without 
affecting the lakefs repository name validation



##########
core/file-service/src/main/scala/edu/uci/ics/texera/service/resource/DatasetResource.scala:
##########
@@ -239,43 +239,70 @@ class DatasetResource {
 
     withTransaction(context) { ctx =>
       val uid = user.getUid
-      val datasetDao: DatasetDao = new DatasetDao(ctx.configuration())
       val datasetUserAccessDao: DatasetUserAccessDao = new 
DatasetUserAccessDao(ctx.configuration())
 
       val datasetName = request.datasetName
       val datasetDescription = request.datasetDescription
       val isDatasetPublic = request.isDatasetPublic
       val isDatasetDownloadable = request.isDatasetDownloadable
 
-      // Check if a dataset with the same name already exists
-      if (!datasetDao.fetchByName(datasetName).isEmpty) {
-        throw new BadRequestException("Dataset with the same name already 
exists")
-      }
-
-      // Initialize the repository in LakeFS
+      // validate dataset name
       try {
-        LakeFSStorageClient.initRepo(datasetName)
+        LakeFSStorageClient.validateRepositoryName(datasetName)

Review Comment:
   Semantically,
   - datasetName is user given
   - repositoryName is allocated by internal services.
   Therefore, although LakeFS has a good validation rule, it is misleading to 
call `LakeFSStorageClient.validateRepositoryName()` on the `datasetName`. 
   
   You can do the following here:
   - create a static function in `DatasetResource` named `validateDatasetName`. 
This function does the exact same thing as `validateRepositoryName`. 
   - call validateDatasetName here.
   
   This duplication is meaningful as it makes the semantic consistent. Also it 
provides the flexibility for us to modify the validation rule later, without 
affecting the lakefs repository name validation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to